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Abstract 



A central problem in the operation of large wireless networks is how to deal 
with interference - the unwanted signals being sent by transmitters that a 
receiver is not interested in. This thesis looks at ways of combating such in- 
terference. 

In Chapters 1 and 2, we outline the necessary information and commu- 
nication theory background. We define the concept of capacity - the highest 
rate at which information can be sent through a network with arbitrarily low 
probability of error. We also include an overview of a new set of schemes 
for dealing with interference known as interference alignment, paying special 
attention to a channel-state-based strategy called ergodic interference align- 
ment. 

In Chapter 3, we consider the operation of large regular and random net- 
works by treating interference as background noise. We consider the local 
performance of a single node, and the global performance of a very large net- 
work. 

In Chapter 4, we use ergodic interference alignment to derive the asymp- 
totic sum-capacity of large random dense networks. These networks are de- 
rived from a physical model of node placement where signal strength decays 
over the distance between transmitters and receivers. 

In Chapter 5, we look at methods of reducing the long time delays incurred 
by ergodic interference alignment. We decrease the delay for full performance 
of the scheme, and analyse the tradeoff between reducing delay and lowering 
the communication rate. 

In Chapter 6, we outline a problem of discovering which users interfere 
with which; a situation that is equivalent to the problem of pooled group test- 
ing for defective items. We then present some new work that uses informa- 
tion theoretic techniques to attack group testing. We introduce for the first 
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time the concept of the group testing channel, which allows for modelling of a 
wide range of statistical error models for testing. We derive new results on the 
number of tests required to accurately detect defective items, including when 
using sequential 'adaptive' tests. 

Chapter 7 concludes and gives pointers for further work. 
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Introduction 



A central problem in the operation of large wireless networks is how to deal 
with interference - the unwanted signals being sent by transmitters that a re- 
ceiver is not interested in. This thesis looks at ways of combating such inter- 
ference in large random wireless netoworks. 

In Chapter 1: Information, we briefly summarise information theory in the 
single user (point-to-point) case. 

A channel models how signals are corrupted by noise. We pay particular 
attention to the Gaussian channel, which is a good model for real-world wire- 
less communication, and the finite field channel, which can be thought of as a 
discretisation of the Gaussian channel 

The capacity of a channel tells us how much information we can send 
through the channel for an arbitrarily low probability of error. Shannon's chan- 
nel coding theorem tells us how to calculate the capacity of a channel. We also 
demonstrate the capacity of the Gaussian channel under a power constraint. 

Fading models how signals can decay and distort when sent over long dis- 
tances. We investigate three t5^es of fading -fixed, slow, and fast - and show 
how they affect the channel capacity. 

hi Chapter 2: Interference, we extend our study to multiuser networks. 

We look at information theoretic models of wireless networks, concentrat- 
ing on the interference network, where many transmitter-receiver pairs want 
to communicate through the same medium. This network suffers from the 
problem of interference. 

Weak interference can be ignored and treated as background noise, while 
strong interference can be decoded and subtracted. The main problem for 
networks is interference of a similar strength to the desired signal. 
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We look at resource division strategies, which share the channel resources 
between the users. While such schemes are simple to operate, they perform 
poorly when the number of users is high. 

Of more interest are new interference alignment strategies. These work by 
the following idea: if transmitters plan their signals carefully, then for each 
receiver the interfering signals can be aligned together, with the desired sig- 
nal split off separately. Interference alignment techniques offer potentially far 
higher performance than resource division schemes. We pay particular atten- 
tion to a channel state-based strategy called ergodic interference alignment. 

Chapter 3: Regular and Poisson random networks, shows how a simple 
interference-as-noise technique can be useful when communicating over short 
hops in well-structured networks. 

In a d-dimensional regular network, nodes are placed on the grid Z'*. We 
show that if signals decay like distance"" for a > d, then all nodes can com- 
mxmicate at some fixed rate r. We call this linear growth, as the sum-rate of 
coiranunication of a collection of nodes scales linearly with the number of 
nodes. 

We also look at nearest-neighbour communication in Poisson random net- 
works, where nodes are placed at random like a Poisson point process. We 
give bounds on the outage probability, the chance that a given link is unable to 
communicate at some fixed rate. We also show that linear growth occurs with 
probability tending to 1. 

This chapter is joint work with Oliver Johnson and Robert Piechocki. 

In Chapter 4: Sum-capacity of random dense Gaussian interference net- 
works, we consider spatially separated IID networks with power-law attenuation, a 
natural model for wireless networks. We derive the asymptotic sum-capacity 
of such networks by using ergodic interference alignment to show achievabil- 
ity, and subtle probabilistic and counting techniques to show the converse. 

We also give an alternative proof (with an improved rate of convergence) 
to a recent theorem of Jafar on the sum-capacity of large random networks. 

This chapter is joint work with Oliver Johnson and Robert Piechocki. This re- 
search has been published in IEEE Transactions on Information Theory [1 ], and 
in the Proceedings of the 2010 IEEE International Symposium on Information 
Theory [2]. 

Chapter 5: Delay-rate tradeoff in ergodic interference alignment considers 
the long blocklengths required to perform ergodic interference alignment. We 
outline a new scheme called JAP(a) and study a beamforming extension and 

derived child schemes. 

We show how to reduce the time delay for full performance of ergodic 
interference alignment. We also show how delay can be reduced even further 
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for the tradeoff of a decrease in communication rate. We analyse the best 
schemes for small networks, and as the size of the network tends to infinity. 

This chapter is joint work with Oliver Johnson and Robert Piechocki. This research 
has been submitted to IEEE Transactions on Commimication Theory - a preprint 
is available on the arXiv [3]. 

We begin Chapter 6: Interference, group testing, and channel coding by con- 
sidering a problem where receivers must aim to detect which transmitters in- 
terfere with them. We show that oui formiilation of this problem is eqmvalent 
to the problem of combinatorial group testing. 

Recent work by Atia and Saligrama has shown how channel coding tech- 
niques can shed light on the problem of group testing. We extend their results, 
by defining group testing channels, and identifying the only-defects-matter prop- 
erty, under which an important theorem holds. 

We give the first information theoretic analysis of adaptive group testings - 
where test pools can be constructed sequentially based on previous outcomes 
- by drawing a comparison with the problem of channel coding with feed- 
back. 

The thesis finishes with Chapter 7: Conclusions and further work. 
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Information 



The fundamental problem of communication is that of reproduc- 
ing at one point either exactly or approximately a message selected 
at another point. 

— Claude E Shannon 
A Mathematical Theory oj Communication [4, page 1] 

In this chapter, we examine the subject of information theory, and in par- 
ticular channel coding, which forms the mathematical basis for studying com- 
munication. 

We start by giving a brief overview of the subject, and making note of a 
'handbook' of useful definitions and facts. We then go through the more for- 
mal mathematics of point-to-point communication, concentrating on accurate 
models for real-life wireless communication. Finally, we study fading, which 
allows us to model how signals distort as passed through space. 

1.1 Infonnation theory: a very short introduction 

Information theory is the mathematical framework used for studying the trans- 
mission of messages in the presence of noise. It was founded by Claude Shan- 
non in his seminal paper of 1948, "A mathematical theory of communication" 
[4]. Shannon's information theory involves the sending of a message - infor- 
mation that a transmitter (typically called Alice) might wish a receiver (Bob) 
to know - through a channel, such as a telephone line, an internet connection 
or a computer cable. 

Alice • >■ o Bob 
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The method, language or standard used to transfer the message is called 
the code. (We use 'code' in the sense of 'Morse code' - there is no intention to 
keep the transmitted signal secret as well.) 

One of the major goals of information theory involves quantifying how 
much information can be sent down a channel and how quickly. For example, 
if Alice wants to arrange a meeting with Bob, she might send the message "Hi 
Bob, meet me at five o'clock on Monday." But if the line is very crackly. Bob 
might interpret the message incorrectly: "Hi Bob, meet me at nine o'clock on 
Sunday." How could Alice ensiure that this doesn't happen? Perhaps Alice 
coiild use a code where she repeats the sentence a number of times, hoping 
that it would be more likely that Bob could deduce the intended message. But 
this takes a longer amount of time - we say that her rate of commimication is 
very low - and phone calls are expensive, making this undesirable. 

Unsurprisingly, there is a trade-off to be made between the rate at which 
Alice can send the information and the probability that Bob receives it without 
error. Before Shannon, it was widely assumed that the only way to make the 
error probability as small as desired was to reduce the rate of communication 
toward zero too [5, Section 5.1]. However, Shannon discovered that, while 
this trade-off certainly exists, the error probability can made arbitrarily small 
whUe maintaining a commimication rate bounded away from zero. 



Rate t 




In other words, there is a cutoff rate c such that if we attempt to send 

information at a rate r < c, we can do so with an arbitrarily low risk of error, 
whereas if we attempt to send information at a rate r > c, the probability of 
error is bounded away from 0. Shannon called this cutoff rate c the capacity of 
the channel. 



Error-free communication ^ Errors 

at rates r < c c rates r > c r 



1 .2. Handbook of useful facts 
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Shannon managed to calculate the capacity of a niraiber of communication 
channels in terms of the statistical properties of the noise in a channel. The 
result is known as Shannon's channel coding theorem (and is stated as Theorem 
1.11 later). 

The aim of this thesis is to find boimds and approximations for capacities 
of complicated multi-user networks. In particular, we will be interested in net- 
works that model large real-world wireless networks, such as WiFi computer 
networks or BlueTooth. 

The capacity of a channel, such as a wireless link, tells us the maximum 
rate at which we can send information while being assured the messages are 
received accurately. Note however that merely knowing the capacity does 
not give us a method of achieving communication at, or even near, capac- 
ity. Nonetheless, the capacity is still a useful benchmark for the quality of 
a channel. First, it gives us a 'best case' against which we can compare any 
technologies: if a new code allows us to communicate at a rate near capac- 
ity, then this technology is about as good as it's going to get, and there is no 
need to spend more money on research. Second, studying the mathemati- 
cal form of the capacity may help us improve the channel itself; for instance, 
whether extra resources would be best spent increasing power, bandwidth, or 
the number of antennas. 

1.2 Handbook of useful facts 

The following basic concepts of information theory will be referred to often in 
this thesis; we collect them here for reference. 

More information is available in any basic information theory textbook - 
Cover and Thomas's Elements of Information Theory [6] is a favourite of mine. 



26 



Chapter 1. Information 



Mass and density functions. For a discrete random variable X, we de- 
note its probability mass function by p{x) :— P(X — x). If X is continuous, 
p{x) denotes its probability density function. Joint and conditional mass/density 
functions are denoted p{x,y) and p{y \ x) respectively. 



Entropy and related concepts. The entropy of a discrete random variable 
Xis 

H(X):=Elog^ = EpWlog^, (HBl) 

where here, as everjrwhere in this thesis, log := log2 denotes the binary 
logarithm. When X is continuous, the sum is replaced by an integral. (This 
last comment holds for all the following definitions.) 
The joint entropy of the pair (X, Y) is similarly 



H(X,y) := Elog^ = EEp(.v,!,)1os^^ (HB2) 



X y 

The conditional entropy of Y given X is 

H(y I X) :=Elog^ =i:i;Kx,y)log^. (HB3) 

If X and Y are independent, then it is easy to show [6, Theorem 2.6.5] that 

H(Y I X) = H(Y) H(Y + X I X) = H(Y) (HB4) 

The relative entropy distance from one probability function p{x) to another 
q{x) is 

D(p(x) II q{x)) E^(,) log ^ = ^Vi^) log ^ > 0. (HB5) 

The mutual information between X and Y is 

I(X:Y) ■.= J^{p{x,y)\\p{x)p{y)) 

P(^'J') ^p(X)p(Y) 

It easy to show [6, Theorem 2.4.1] that 

I(X : Y) = H(Y) - H(Y | X), (HB7) 



1 .2. Handbook of useful facts 
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Hu correspondence A useful way to memorise the relationship between 
these concepts is the Hu correspondence. In the following picture, the area 
of each rectangle corresponds to the quantity. 



H(X) 



H(Y) 



H(X,Y) 



H(X I Y) 



H(Y I X) 



I(X : Y) 



The complex Gaussian distribution. A circularly-symmetric complex Gaus- 
sian random variable Z ^ CN(0, a^) with variance cr^ —E \Z\^ is defined by 
the probability density function 



p(z) 



1 



zeC. 



(HB8) 



Maximum entropy. Out of all discrete random variables X on the set 
{0, 1, . . . , q — l}, the maximum entropy is achieved when X is uniform [6, 
Theorem 2.6.4], giving 

maxH(X) =H(U({0,l,...,q-l})) =log^. (HB9) 

X 

Of all continuous random variables Z on C with power E |Z|^ at most 
cr^, the maximum entropy is achieved when Z ~ CN(0, tr^) [6, Example 
12.2.1], giving 



max H(Z) = H(CN(0,c72)) = loz(ne(r^). 

Z:E|Z|2<£r2 V v ov y 



(HBIO) 



Typical set. Let X be a random variable on a countable set X. Given 
a sequence X = (X[l], X[2], . . . , X[T]) e A"^ of T random draws from X, 
then X will very likely take one of only 2™(^) < | A" | ^ different values, 
and each of these values is almost equally likely. These values make up the 
so-called typical set, and this property is called the asymptotic equipartition 
property. 

We say (X, Y) is jointly typical of (X, Y) if X is in the t5^ical set of X, Y 
is in the t5^ical set of Y, and the pair (X, Y) is in the tj^ical set of the pair 
(X,Y). 
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1.3 Channels, codes, and capacity 

We will now set out mathematically the ideas of channels, codes, and capacity 
from Section 1.1. 

To specify a channel, we need to say what inputs are allowed into the chan- 
nel, what outputs can be produced, and how the noise randomly corrupts the 
input. 

A common example of a channel is the binary symmetric channel. The BSC 
allows 'bits' - binary digits: Os and Is - into the channel. It then either outputs 
the same bit or, with some fixed probability, flips to the other bit. 

Definition 1.1. A communication channel consists of 

1. a set X, the input alphabet; 

2. a set y, the output alphabet; 

3. a probability transition function p{y \ x) relating the two. 



X 






y 


Input 


p{y 


X) 


Output 


alphabet X 




alphabet y 



Definition 1.2. We can now formally define the binary symmetric channel with 
error probability p <\. This channel is defined by alphabets X = y = {0,1] 
and transition function 

p(0 I 0) = 1 - p p(l I 0) = p 

p{0 \l) — p p(l 1 1) = 1 — p. 




A channel which is an acciurate and widely-used model for wireless com- 
munication [5, Chapter 5.1] is the Gaussian channel. It arises from the sam- 
pling of a bandlimited continuous-time channel with white noise [7]. (For 
convenience, we assume a unit bandwidth.) Gaussian white noise is used for 



1.3. Chaimels, codes, and capacity 
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two reasons. First, it seems a reasonable model: the superposition of lots of 
small pieces of noise ought to (due to the central limit theorem) look roughly 
Gaussian. Second, it is the simplest case mathematically, and often leads to 
analytically tractable solutions. 

The Gaussian channel takes any number as an input and corrupts it by 
adding random Gaussian noise. By convention, and for useful modelling rea- 
sons, the channel is usually defined in terms of complex niimbers [5, Subsec- 
tion 2.2.4]. 

Definition 1.3. The Gaussian channel with noise power cr^ has alphabets X = 
y — C, and probability transition density p{y \ x) defined implicitly by the 
relationship Y = x + Z, where the Z ~ CN(0, a^ ) are all independent. 

(If the channel is used multiple times, we assume a new random Z is 
drawn each time.) 

Transition density of the Gaussian channel 
(real part only) 




Another channel we will examine in this thesis is the finite field channel. 
This channel can be a useful model of a Gaussian channel that has been quan- 
tised (or discretised). 

Definition 1.4. Let q be prime, and let Z be a random variable defined on the 
finite field — {0, 1, . . . , q — l}. The finite field channel of size q with noise 
Z has alphabets X — y — Wq, and probability transition density p{y \ x) 
defined implicitly by the relationship Y — x + Z (mod q). (Again, multiple 
uses assume independent Zs.) 

Note that the BSC is a special case of the finite field channel where q — 2 
and Z — 1 with probability p or Z = otherwise. 

Now that we have a channel, we can design codes for that channel. A code 
takes a set of M messages, and encodes each message into a string - called a 
codeword - of length T. After the channel has been used T times to send the 
codeword, there must be a rule for decoding the received string, to estimate 
which message was sent. 

Definition 1.5. An (M, T)-code for the channel {X, y,p{y \ x)) consists of: 
1. a message set M of cardinality M, 
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2. an encoding function x: M. ^ X , 

3. a decoding function in: — >• M. 

The set of codewords {x(m) : in e M.} is called the codebook; the parameter 
T is called the block length. 

The problem of channel coding works like this: 

• The transmitter (Alice) requires to send a message m E M. 

• Alice encodes this message into a codeword x{m) e X^. 

• Alice sends the first letter x{m) [1] of the codeword through the channel 
to the receiver (Bob). Bob receives a corrupted version y[l] E y of the 
letter, where the corruption has occurred at random according to p(y | 
x). 

• Alice sends the second letter x{m) [2] of the codeword through the chan- 
nel to Bob. Bob receives a corrupted version y [2] e y of the letter, where 
the corruption has occurred at random according to p{y \ x). 

• 

• Alice sends the final letter x{m) [T] of the codeword through the channel 
to Bob. Bob receives a corrupted version y[r] e y of the letter, where 
the corruption has occurred at random according to p(y | x). 

• Bob now decodes his received word y to make his estimate in{y) of Al- 
ice's original message m. Hopefully, in — m, and the message has been 
communicated successfully. 



So the system as a whole looks like this (the channel is in grey): 
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All channels considered in this thesis will be memoryless, in that the chan- 
nel's current performance is independent of earlier behaviour. In other words, 
the transmission of codewords follows a product distribution 

T 

f=l 

So far, we have considered only static channels, where the probability tran- 
sition function remains fixed over time. (Later, we will look at fast-fading 
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channels, where the transition function is no longer fixed but changes from 
timeslot to titneslot.) 

Note that, from a mathematical point of view, what the messages are is unim- 
portant - what matters is how many of them there are. So it often makes 
sense to choose the message set Ai to be something convenient. For example, 
when dealing with a finite field channel of size q, we often take Ai to be F^, 
which has cardinality M = q^. In particular, when q = 2, the message set 

= = {O, l}^ is the set of all bit strings of length S — log2 M. 

We define the rate of a code to be the number of bits that we can send 
per channel use. The number of bits is logj M, as above, and the niraiber of 
channel uses is T, so the rate is (log2 M) / T. 

Definition 1.6. The rate of an (M, T)-code is defined to be (log2 M)/T bits per 
transmission. 

(From now on, all logarithms are to base 2, and we just write log for log2.) 

Also associated with a code we have its error probability, the chance that a 
message is decoded incorrectly. (We take the average error probability across 
all messages, but if we were to use the maximum error probability, the main 
results of this chapter would be the same.) 

Definition 1.7. The average error probability is 



We can now give an example of a code for the BSC. 

Suppose there are two messages we might wish to send: "No" and "Yes", 
soM = {No, Yes}. 

A very simple code could assign to be the codeword for "No" and 1 
to be the codeword for "Yes". This is a (2, l)-code. It clearly has a rate of 
(log 2) /I = 1 bit per transmission and error probability p. 

To reduce the error probability, we will need a more sophisticated code. 
One method of coding would be the repetition code where each symbol is 
repeated T times, so 




meM 



1^ 

M 



meMyey' 



E p{y \^{m))l[m{y) y^m]. 



x(No) = 00 • • • e A*^ 



x(Yes) = 11 • • • 1 e A"^. 



Definition 1.8. A (2, T)-code is called a T-repetition code if the codebook con- 
sists solely of the all-0 and all-1 codewords of length T. 
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The obvious method of decoding is the 'majority rule': if y has more Os 
than Is, decode as "No"; if y has more Is than Os, decode as "Yes". (If there 
are exactly r/2 of each, decoding may be performed arbitrarily.) 

Note that the rate of this code is (log 2) /T — 1/T, and the error probability 
is boimded by e > p^, the probability that all T symbols flip. 

What does it mean to be able to communicate through a channel at some de- 
sired rate r? Well, it means that there must exist a code for the channel with 
rate at least r and a low probability of error. How low? As low as we desire. 
If we want to limit the error probability to 5%, then there must be a code with 
rate at least r and error probability no more than 5%; but if we want the error 
probability to be as low as 1% or even 0.01%, there has to be a code for that 
too. 

Definition 1.9. Consider a channel {X, y, p(y \ x)). 

A rate r is achievable if for any error tolerance e > 0, there exists a code with 
rate at least r and error probability lower than e. 

Otherwise, r is not achievable, in that there exists an error threshold e such 
that there exists no code with rate at least r and error probability lower than e. 

The capacity is defined to be the maximum achievable rate. 

Definition 1.10. Consider a channel {X,y,p{y \ x)). Then we define the 
capacity of the channel, c, to be the supremum of all achievable rates: 

c :— sup{r : r is achievable}. 

In other words, all rates r less than c are achievable, but no r above c is 
achievable. 

Shannon calculated the capacity as the maximum mutual information be- 
tween the input and output of a channel [4, Theorem 11]. 

The mutual information I(X : Y) between X and Y can be seen as a mea- 
suie of 'how independent' X and Y are. If I(X : Y) is large, then X and Y are 
highly dependent, so knowledge of the output Y gives us lots of information 
about the input X; if I(X : Y) is small, then X and Y are highly independent, 
so knowledge of the output Y gives us little information about the input X. 

Theorem 1.11 (Shannon's channel coding theorem). Consider a discrete channel 
i'^, y, p(y I x)), that is, a channel where both X and y are both countable sets. 

Then the capacity c is given by the formula c = maxxI(X : Y), where Y is 
related to X through p{y \ x), and the maximum is over all input random variables 
X defined on X. 
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The characterisation of capacity given by Shannon's channel coding theo- 
rem (Theorem 1.11) allows us to calciilate the capacity of the finite field chan- 
nel and the BSC. 

Theorem 1.12. The capacity of the finite field channel of size q with noise Z zs 

c = log^-H(Z) =:D(Z). 

The capacity of the BSC with error probability p is 

/ 1 1 

C = l- plog- + (l-p)log: 



p 1 — p 



Capacity of the 
binary symmetric 
channel 




Error probability p 



(We use the abbreviation D(Z) :— logq — H(Z) since this is equal to the 
relative entropy distance D(p(z) || p{u)) between Z and a uniform random 
variable LI over Wq.) 

Proof. Finite field channel. By Shannon's channel coding theorem (Theorem 
1.11) we need to calciilate the mutual information. This is 

I(X : Y) = H(Y) - H(Y | X) (HB7) 
= H(Y) - H(X + Z\ X) (Y = X + Z, Definition 1.3) 

= H(Y) - H(Z). (HB4) 

This is maximised by choosing X to be uniform on F^, by (HB9), so that Y is 
uniform also, giving 

c = maxI(X : Y) = log^ - H(Z) = D(Z), 

as required. 

BSC. This result follows from recalling that the BSC is a special case of the 
finite field model. □ 

Later in this thesis, we will often see examples of channels and networks 
whose capacities are a constant fraction of the finite field channel capacity. If 

a channel or network has capacity c = dD(Z) for some constant d, we say 
that the channel has d degrees of freedom (also known as the multiplexing gain or 
pre-log term). 
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Definition 1.13. Given a discrete channel or network with capacity c, we de- 
fine the degrees of freedom to be dof — c/D(Z) . 

Clearly, the finite field channel itself has a single degree of freedom, that is 
we have dof — 1. 

We have not yet talked about the proof of Shannon's channel coding theorem 
(Theorem 1.11). 

To prove Shannon's channel coding theorem (and related theorems) we 
must prove two things: 

Achievability First, we must show that any rate below capacity r < c is 
achievable. That is, we must find a sequence of codes all with rates at 
least r < c, but with arbitrarily low error probabilities. 

Converse Second, we must show that any rate above capacity r > c is not 
achievable. That is, we must show that for any sequence of codes all 
with rates at least r > c, the error probabilities must be bounded away 
from 0. 

The converse part is proved using Fane's inequality [8], which bounds 
the error probability in terms of the conditional entropy across the channel 
]H(y I X) and the size of the message set M. 

Shannon's key insight into the achievable part is the following: instead of 
trying carefully to design special codes with high rates and low error prob- 
abilities, we can instead just pick the code at random. That is, we choose 
the codeword letters X{m)[t] IID according to some distribution X. If we set 
M — l^J^] , then the rate of the code will be at least r. We hope that by choos- 
ing T sufficiently large, the error probability will be driven arbitrarily low. 
(Later, we can optimise over the choice of X.) 

This random encoder can be twinned with an effective decoder to show 
that any rate r <c can be achieved. Two different decoders can be used: 

Joint typicality decoder The receiver takes the channel output y and finds 
the unique codeword x(m) such that the pair (x(m),y) is jointly t5^ical 
of (X, Y). (See the handbook. Section 1.2, for definitions.) This m is the 
decoding estimate. (If there is no such x (m) or it isn't imique, we declare 
an error.) 

Maximum likelihood decoder The receiver takes the charmel output y and 
decodes to the message most likely to have yielded it. That is, we pick 
m to maximise 

T 

p(y I x(m)) = np(yW I ^{m)\f\) 

t=\ 

(If the maximum isn't unique, we declare an error.) 
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Shannon himself [4, Section 13] and most subsequent authors (for example 
[6, Chapter 7], [9, Chapter 10]) use the joint typicality approach, as it gives a 
fairly simple and short proof. 

On the other hand, Gallager [10] used the maximum likelihood approach 
to prove Shannon's channel coding theorem, by boimding the error probabil- 
ity by e < 2~^^^''\ where E(r) is called the error exponent, and examining the 
error exponent for different values of r. We shall return to Gallager's maxi- 
mum likelihood approach in Chapter 6, when proving a similar theorem for 
group testing. 

We now outline the achievability proof using the joint typicality decoder. 
Basic facts about typical sets are given in the handbook (Section 1.2). 

Sketch proof of achievability. Theorem 1.11. As above, we set the number of mes- 
sages to be M = [2^''], choose codeword letters X(m)[t] ED at random ac- 
cording to some distribution X, and decode using a joint typicality decoder. 
There are two ways we could get an error. 

First, the actual codeword X and the received output Y coiild fail to be 
jointly typical. But the theory of t}^ical sets tell us that this event is very 
imlikely. 

Second, another codeword X{m) could be jointly t5^ical with Y, despite 
X{m) and Y actually being independent of each other. Since X(m) and Y are 
very likely to (marginally) t5^ical, joint typicality occurs with approximate 
probability 

#jointly typical (x,y) _ 2™(^''^) 
#typicalx x#typicaly ~ 2™W2™W 

_ 2-T(H(X)+H(Y)-H(X,y)) 

_ 2-n(X:Y) 

by standard facts about t5^ical sets (see the handbook. Section 1.2). Hence the 
probability of error is approximately 

e < ^ Pr(X(m) and Y jointly t5^ical) 

- (M - 1)2-"(^^^) 
= ([2^'] -l)2-"(^-^) 
< 2^''2~"(-^'^) 

^2-ni(X:Y)-r)_ 

So provided r < I(X : Y), then by choosing T large enough, the error proba- 
bility can be made arbitrarily small. 
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Choose X to maximise I(X : Y) to get the result. □ 

In a sense, this theorem is qmte a 'lucky' resiilt: it turns out that the lower 
bound on capacity given by Shannon's random coding argument and the 
upper bound given by Fano's inequality coincide, to give us the equality 
c = maxxI(X : Y). 

For other networks, we may not be so lucky. However, similar proof strate- 
gies can be useful. If we can show that aU rates below some r* are achievable, 
this gives us a lower boimd on capacity: c > . Conversely, if we can show 
that no rates above some r* are achievable, then we have an upper bound: 
c < r* . In the point-to-point case, we have r* = r* = c. But even if there is a 
gap between the upper and lower bounds, the result can be useful in giving 
us an approximation to the capacity. In particular, sometimes there may be a 
limiting sense in which the upper and lower bounds are as5anptotically equal 
- for example, as signal power or niimber of users tends to infinity. 

It is often useful to think not just of individual codes, but oi families of codes. 
One family of codes we have already seen is the repetition code (Definition 
1.8). 

MacKay [9, Section 11.4] divides families of codes into three separate cate- 
gories, depending on how effective they are for their channel. 

Bad codes In bad families of codes, as we force the error probability to 0, the 
rate of the codes approaches also. 

Good codes In good families of codes, as we force the error probability to 0, 
the rate of the codes is boimded above 0, but below capacity. 

Very good codes In very good families of codes, as we force the error prob- 
ability to 0, the rate of the codes can be maintained arbitrarily close to 
capacity. 

Earlier we saw that rate of the repetition code is 1 / T, and its error prob- 
ability is bounded by e > p^. Hence, to force the error probability e to 0, we 
must send T — )• oo, and the rate tends to 0. Hence, the repetition code is a bad 
code. (MacKay notes, however, that bad codes are not necessarily practically 
useless [9, p. 183].) 

It is sufficient for this thesis to note that Shannon's channel coding theorem 
(Theorem 1.11) tells us that very good (capacity achieving) codes do exist, 
and that we can do much better than simple codes like the repetition code. 
(The design of good and very good practical codes is outside the scope of this 
thesis.) 

In later work in this thesis, rather than finding new codes from scratch, 
we will instead adapt these very good point-to-point channel codes for use in 
large networks. 
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A useful class of codes for finite field channels is the class of linear codes. If we 
take yV( = for the message set again, then a linear code is a code whose 
encoding function x : — > is a linear map. 

It's often useful to represent this linear map by an S x T matrix G, so 
x(m) — Gm. We call G the generator matrix of the code. The rate of such a 
code is (log M)/T = {S/T)logq. 

Definition 1.14. Consider a finite field channel of size q. Then a linear code is 
a {q^, T)-code with message set M — W^ and encoding function x(m) — Gm 
for some generator matrix G e F^^^. Any decoding fimction may be used. 
We call S the rank of the code. 

For example, the T-repetition code is a linear code with field size q — 2, 
rank S — 1 and 1 x T generator matrix G = (l 1 • • • l). 

The important fact about linear codes (at least for finite field channels) is 
that, when paired with an optimal decoder, very good (capacity achieving) 
linear codes exist [9, Chapter 14]. Thus, if we restrict our attention only to 
linear codes, we can still achieve all rates up to the capacity c — D(Z) of the 
finite field channel. 

Theorem 1.15. Very good linear codes exist for all finite field channels with nonzero 
capacity. 

The current state of the art for high-rate practical codes - that is codes 
with low encoding and decoding complexity and moderate block lengths - is a 
class of random linear codes called low-density parity-check codes [9, Chapter 
47]. (See the textbook of Richardson and Urbanke [11] for more details.) 

1.4 Power 

We have not yet looked at codes for the Gaussian channel. 

The capacity of the Gaussian channel is infinite, as there exist codes with 
arbitrarily high rates and simultaneously arbitrarily low error probabilities. 

To see this, consider the following. Let M. — {1,2, ... ,M}, encode using 

x{m) = mN for some very large N, and decode to the nearest positive integer 
to y/N (which should be roughly m). This is an (M, l)-code, with rate logM. 
By picking N large enough, the error probability can be made arbitrarily small; 
but by picking M large enough, the rate can be made arbitrarily high. 

This is neither mathematically interesting nor physically realistic. Anten- 
nas for wireless networks are not capable of transmitting at arbitrarily high 
powers. Thus, we introduce a power constraint: that for all codewords x, the 
power - the mean square value - is limited by a prescribed value P. 
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Definition 1.16. The power of a codeword x is defined to be 

t=i 

The power of a code is defined to be the maximum power of any code- 
word. 

So we want to limit our attention to codes whose power is at most the 
power constraint P. 

Definition 1.17. Consider a channel {X,y,p{y \ x)). 

A rate r is achievable with power P if for any error tolerance e > 0, there 
exists a code of power at most P with rate at least r and error probability 
lower than e. 

Otherwise, r is not achievable with power P, in that there exists an error 
threshold e such that there exists no code of power at most P with rate at 
least r and error probability lower than e. 

Definition 1.18. Consider a channel {X,y,p{y \ x)). Then we define the 
capacity of the channel with power P to be the supremirai of all achievable rates: 

c := sup{7' : r is achievable with power P}. 

In other words, all rates r less than c are achievable with power P, but no r 
above c is achievable. 

Shannon calculated the capacity of the Gaussian channel with a power 
constraint in his original paper [4, Theorem 17] 

Theorem 1.19. Consider the Gaussian channel with power constraint P and noise 
power cP-. 

Then the capacity is given by the formula c — log (1 + snr), where we have 
defined the signal-to-noise ratio snr := P/cr^ to be the ratio of the signal power to 
the noise power 



Capacity of the 
Gaussian channel with 
a power contraint 




Signal-to-noise ratio snr 
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Two useful approximations for the capacity of the Gaussian channel are 

snr log e for small snr, 
c = log(l + snr) ^ \ ^ (1.1) 

I log snr for large snr. 

So at low snr, capacity grows linearly with snr; whereas at high snr, capacity 
only grows logarithmically. 

Later in this thesis, we will often see examples of channels and networks 
whose capacities at high-snr are often a constant fraction of the Gaussian, 
channel capacity c log snr. If a channel or network has capacity c = 
d log snr + o(log snr) at snr oo for some constant d, we say that the channel 
has d degrees of freedom. (This is the Gaussian analogy to the finite field degrees 
of freedom in Defintion 1.13.) 

Definition 1.20. Given a channel with capacity c(snr) and power constraint 
P — snr cr^, we define the degrees of freedom to be 

c(snr) 

dof = lm\ ^ ^ 



snr— >oo log snr 

where this limit exists. 

From (1.1) the Gaussian channel itself has a single degree of freedom, that 
is, we have dof — 1. 

Sketch proof of Theorem 1.19. In a similar manner to Shannon's channel coding 
theorem (Theorem 1.11), it can be shown that the capacity of the Gaussian 
channel with a power constraint is maxx:]E |x|2<p • (This is certainly 
a believable result: it is the same formula as for discrete channels, with the 
additional constraint that the expected power satisfies the constraint.) 
It remains to calculate the mutual information. This is 

I(X : Y) = H(Y) - H(Y | X) (HB7) 
= H(Y) - H(X + Z I X) (Y = X + Z, Definition 1.3) 

= H(Y) - H(Z) (HB5) 
= H(Y) - \o%{neo^). (Z ~ CN(0, a^), Definition 1.3) 

Note that 

E IYP = E|X + zP = E |xP + E IzP < P + cr^. 



Hence the entropy of Y is maximised by choosing X to be complex Gaussian 
with variance P, by (HBIO), so that Y is Gaussian also, with power P + a^, 
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giving 



c :— max I(X : Y) = log ((7re(P + cr^)) - log(7recr2) 




= log(l + ^) 



as required. 



□ 



Note that the input distribution that achieves capacity is X ~ CN(0, P). So 
the signal is statistically the same - that is, distributed in the same parametric 
family - as noise, but with a different power. This fact will be useful later 

As we mentioned earlier. Theorem 1.19 tells us that very good (capacity 
achieving) codes exist for the Gaussian channel. Later, we wUl adapt these 
very good point-to-point codes for use in large Gaussian networks. 



In their standard forms, the finite field and Gaussian channels are represented 
by the formula Y[t] — x[t]+ Z[t]. We can interpret this as the signal being 
transmitted perfectly through the channel, except for the addition of some 
noise. However, for a more realistic model of wireless networks, we need 
to account for the way the signal itself transforms as it is sent through the 
channel. For example, in the Gaussian channel, we might expect the signal 
power to decay over long distances, and standard physical models suggest 
that the phase of the signal arg will alter as it is transmitted through space 
[5, Section 2.1]. 

We can model these concepts by introducing a fading (or channel state) coef- 
ficient H[t]. Our channels now become Y[t] = H[t]x[t] +Z[t]. 
We are interested in three cases: 

Fixed fading where H[t] — /i is a fixed deterministic constant (Subsection 



Fast fading where the H[t] are IID random (Subsection 1.4.2); 

Slow fading where H[t] — H is random, but fixed for all time (Subsection 



Before we continue, we have a useful simplification to make. Since the fading 
Gaussian channel 
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1.4.1); 



1.4.3). 



Y[t] = H[t]x[t] + Z[t] Z[t] - CN(0,(r2) 



X 



2<P 
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will be used a lot in this thesis, it makes sense to change our units, so that the 
noise power and power constraint are both iinity. To that end, set 

m ly[t] H[t] := \[^H[t] x[t] := -^x[t] Z[t] ^Z[f]. 



Under this change of units we have, after dividing through by a, 

Y[t] = H[t]x[t] + Z[t] Z[t] ~ CN(0, 1) p2 < 1. (1.2) 

From now on, we will solely use this model, so will shall drop the tildes. Note 
that under this change of units, the signal-to-noise ratio 

P|H|2 _ 1|H|2 

remains the same. (Note also that under this change, the Gaussian channel 
with no fading inherits a fixed fading coefficient H[t] — h — y/Pja^.) 

1.5.1 Fixed fading 

Fixed fading models fading that is constant and predictable, such as the decay 
in signal power between a non-moving transmitter and a non-moving receiver 
within a fixed environment. 

We model the fading coefficient as a deterministic constant fixed for all 
time, H[t] = h for all t, giving Y[t] = hx[t] + Z[t]. 

How does the capacity alter now? 

For the finite field channel we have ft G F^. Note that for nonzero h, the 
function x hx is a bijection, so the channel is equivalent to that without 
fading {h = 1), and still has capacity D(Z), from Theorem 1.12. On the other 
hand, iih — 0, then Y is always 0, and all signals are indistinguishable, so the 
capacity is 0. Hence, for the finite field channel, the capacity is 



c{h) 




if /i = 
otherwise. 



(For simplicity, we often just assume that h is nonzero, so the capacity is un- 
changed as c = ID(Z).) 

For the Gaussian channel with power constraint P — 1 we have h EC. The 
power constraint is now \hx\^ = \h\^\x\'^ < So this channel is equivalent 
to one with no fading, but with the power constraint changed from 1 to 
The capacity is thus 

c = c{h) = log(l + |/z|2) = log(l + snr), 

with the new convention that snr denotes the signal-to-noise ratio at the re- 
ceiver: snr = \h\^P/(T^ — \h\^. 
To summarize: 
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Theorem 1.21. For h ^ 0, the capacity of the finite field channel with fixed fading is 

c = D(Z). 

The capacity of the Gaussian channel with fixed fading is c — log(l + snr), 
where snr = \h\'^. 

For modelling wireless networks, we will often use a Gaussian channel 
with fixed fading coefficient h that decays like a power law over distance. That 
is, we have h — kp~'^^^, where p > is the distance between a transmitter and 
receiver and > is a constant. 

The parameter a > - called the attenuation - represents how resistive the 
environment is to the transmission of radio waves. (Some authors call a/2 the 
attenuation; we do not.) Low a represents an environment with few obstacles 
to signals; high a. implies that a lot of the signal power is absorbed before 
reaching the receiver. In free space, standard physical considerations imply 
that a — 2 or 3; for built-up areas, values of a of roughly 4 or 5 seem more 
appropriate [5, Section 2.1]. The capacity of such a channel is c = log(l + 
k^p~"-), by Theorem 1.21. 

(Power-law attenuation fails to be realistic for small distances p <^1. Here 
the received power woiild be greater than the transmitted power, which vi- 
olates the conservation of energy. Some authors therefore prefer alternative 
models such ash — min{l,fc|0~'*^-^} oxh — k{p + po) *''^ for some fixed con- 
stant jOQ.) 

We could also include a fixed phase change in this model by setting h — 
^p-A:/2gi0 fj.gg space, the phase would scale linearly with distance, soh — 
^p-a/2g27np/A^ where A is the carrier wavelength [12]. Note that we still have 
— k^p~°^, so the capacity is the same. 

1.5.2 Fast fading 

Fast fading models a situation where the state of the channel is changing 
rapidly, such as a commuter using a mobile phone on a train. We model this 
as H[t\ being random according to some distribution H but renewing at each 
channel use; that is, the H[t\ are independent and identically distributed like 
H. 

When we deal with fast fading, the performance of a channel will depend 
on whether the transmitter and receiver know the current value of H[t], or just 
the general distribution H, and whether the transmitter can use this knowl- 
edge to vary their power. 

Throughout this thesis, we assume that both the transmitter and the re- 
ceiver know li[t]. This is known as having perfect channel state information at 
the transmitter (CSIT) and at the receiver (CSIR). We presume that this knowl- 
edge is causal, that is, the receiver and transmitter learn H [t] iimnediately prior 
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to the transmission of x[t] and reception of y[t] respectively. In other words, 
they have no prediction of the channel future to use (except, of coiirse, know- 
ing the future channel states will be IID according to H). 

When the transmitter has CSIT in a Gaussian channel, we must specify 
whether or not she can use this information to operate at varying power. So 
there are two different t5^es of power constraint. Let x[t] (h) be the tth code- 
word letter chosen in channel state h, and let H be the support of H. (Recall 
from our simplification (1.2) that we now have P = 1.) 

Universal A universal power constraint demands that the power constraint is 
held universally over each channel state realisation. That is, we demand 

WfIWF=^Ek[f]WP<l ioraRheH. 

Average An average power demands that the power constraint is held when 
averaged over all channel state realisations. That is, we demand 

hen f=l 

(The second term assumes H is discrete; the siraimation can be replaced 
by an integral if H is continuous.) 

An average power constraint allows a transmitter to use extra power when 
the charmel is at its strongest, and save power when the channel is weak. (For 
more details, see the textbook of Tse and Viswanath [5, Subsection 5.3.3].) 

In this thesis, we always assume a universal power constraint. First, it is 
mathematically simpler to deal with. Second, it is physically unrealistic for 
transmitters to operate above their average power for long periods of time. 

(Similarly, in a frequency-selective charmel, one must specify whether the 
the power constraint is enforced in each individual subchannel, or an average 
across all frequencies.) 

So, assuming perfect CSIT and CSIR (with a universal power constraint in 
the Gaussian case) we have the following. 

Theorem 1.22. Consider a fast fading channel, and let c{h) be the capacity of the 
channel under fixed fading parameter h. 

Then the fast fading capacity is equal to the average fixed fading capacity, in that 
c = Ec(H). 

In the most general case, this theorem is due to Goldsmith and Varaiya 
[13]. We sketch the achievability proof for the case when H is discrete. (Gold- 
smith and Varaiya attribute the result for this simpler case to Wolfowitz [14, 
Theorem 4.6.1].) 
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Sketch proof. Assume H is discrete, in that it can only take values in some 
countable set "H. Then if we 'collect together' all occasions when H[t] has 
some particular value h, we can treat that collection of channel uses as being 
through a fixed fading channel with deterministic fading coefficient h. So at 
these times, we can achieve rates up to c(/i). 

Writing n{h, T) for the proportion of time periods when H[t] — h, 

Tt{h, T):^^\{te{l,2,...,T}: H[t] ^h}\ heU, 
we can achieve the rate 

r— lim y] n{h,T)c{h). 

But by the strong law of large numbers, we have the ergodicity property that 
limy.j.oo Tt{h, T) —F (H = h) (almost surely). Hence (again almost surely), 

c > l^I'iH^ h)c{h) = Ec(H). 
hen 

The converse can be proved using Fano's inequality, as with Shannon's 
channel coding theorem (Theorem 1.11). 

The result for continuous H can be derived from this using a quantisation 
argument [13, Appendix]. □ 

Since this result relies on the sequence of fading parameters being ergodic, 
c is sometimes called the ergodic capacity. So the above result can be interpreted 
as 'the ergodic capacity is the average capacity' (Later, we wiU see how using 
interference alignment in networks can allow us to achieve an ergodic capac- 
ity that is higher than the average capacity.) 

Applying Theorem 1.22 to the finite field channel (Theorem 1.12), we get 

c = E ^ = = (1 - P (H = 0))D(Z). 

hen 

(Again, we often assume H is never 0, so c = ]D(Z) stiU.) 
For the Gaussian channel (Theorem 1.19), we have 

c = ]Ec(H) =Elog(l+ |H|2) =Elog(l + SNR). 

(Since SNR is random here, we capitalise it.) 

One t5^e of fast fading for the Gaussian charmel could be a rapidly chang- 
ing phase, H[t] = A:e'®W, where @[t] ^ U[0,27r) IID over t. This is a good 
model of wireless communication when there are many paths a signal could 
take from transmitter to receiver (in a built-up area, for example) [5, Subsec- 
tion 2.4.2]. Note that here the signal-to-noise ratio is in constant, so the capac- 
ity is unchanged. 

Another model of wireless communication is Rayleigh fading [5, Subsection 
2.4.2], where H[t] ~ CN(0,t2) for some t > 0. In this case, \H[t]\^ is expo- 
nentially distributed with mean t'^ [5, (2.53)]. 



1.5. Fading 
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1.5.3 Slow fading 

Slow fading models the situation where the state of a channel is varying, but 
is doing so very slowly, or where the channel state can only be modelled as 
random, but remains fixed. Here we take H[t] — H as initiaUy random, but 
remaining fixed for all times t — 1,2, . . . ,T. 

Since the channel state is random, so is the capacity C = c(H): if the fading 
is particiilarly deep, H « 0, then the capacity is likely to be very low; if the 
fading is lighter, then the capacity wiU be higher. Specifically, under the event 
that H = h we have C = c{h). 

(As with H, when the capacity is a random variable, we capitalise it as C.) 

One way to summarise the random variable C would be through its cu- 
mulative distribution function pout{i') '■= IP(C < r) = P(c(H) < r), known as 
the outage probability. We can interpret this as the following: if we are trying to 
communicate at some fixed rate r, then then pout{i') is the probability that we 
are imable to do so - we say the channel is in outage. 

Definition 1.23. For a slow fading channel with (random) capacity C, the 
outage probability poat- 1R+ — >• [0,1] of the channel is defined by pout('') := 
P(C < r). 

The event {C < r} is called outage. 

For the Gaussian channel, we have (following Theorem 1.19 and recalling 
that log denotes logj) 



In wireless networks, a good model is to position nodes at random and 
use distance-based attenuation fading. Since distances between nodes are 
random, the fading is random too. But once the nodes are positioned, the 
distances remain fixed. Hence, this gives a form of slow fading. 



PoutW = 1P(C < r) =P(log(l+ |H|2) < r) = P(SNR < 2'' - 1), 



where SNR = |Hp is the signal-to-noise ratio. 

For the finite field channel, we have (following Theorem 1.12) 




P(H = 0) ifO<r<D(Z) 
ifr>D(Z). 



Notes 



The section consists of a review of the existing literature; the mathematical 
content is not claimed to be new. 
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The basic concepts of information theory as outlined in this chapter are all 
due to Shannon's original paper [4] . An exception is the concept of relative en- 
tropy distance, due to Kullback and Leibler [15]; and the Hu correspondence, 
due to Hu [16]. 

The presentation here closely follows the textbook of Cover and Thomas 
[6, Chapters 2, 7-9, 15]. The textbooks of MacKay [9, Part II] and Tse and 
Viswanath [5, Chapters 2, 5, 6] were also useful. 

Although Shannon [4, Theorem 11] first came up with the channel coding 
theorem (Theorem 1.11), he provided only a sketch proof; the sketch proof 
provided here is along the lines of the rigorous proof by Cover [17]. The max- 
imum likelihood approach is due to Gallager [10]. 

Fading was first studied by Shannon [18]. Our treatment of fading follows 
closely that of Tse and Viswanath [5, Sections 2.1, 5.4]. The review paper of 
Biglieri, Proakis, and Shamai (Shitz) [19], and a paper by Caire and Shamai 
(Shitz) [20] were useful. 
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Interference 



In this chapter, we will look at ways of dealing with interference in communi- 
cations networks. 

To start with, we will define information theoretic networks, in a similar 
manner to our definition of channels in Chapter 1. 

We wiU then look at methods of combating interference - the imwanted 
signals from other transmitters that a receiver is not interested in. 

For the purpose of definiteness, we will consider these in the context of 
the interference networks and (mostly) the fading Gaussian case. However, 
the techniques are useful in wider contexts. 

We look at some simple schemes - interference as noise, decode and sub- 
tract, and resource division - and then look at a family of new schemes known 
as interference alignment. We pay particular attention to a scheme called er- 
godic interference alignment, which we will use later in Chapters 4 and 5. 

2.1 Wired and wireless networks 

In this chapter, we will outline the theory of networks. We will concentrate on 
accurate models of real- world wireless networks. 

Wireless communications are becoming increasingly ubiquitous. From 
older technologies like radios, to cutting-edge innovations such as WiFi, Blue- 
tooth and ZigBee, the convenience of the untethered nature of wireless is pop- 
ular on both large and small scales for businesses and consumers alike. 

Compared to a wired (or wireline) network, wireless networks provide 
much greater challenge to engineers and technicians. The main problems are: 

Broadcast Each receiver can send only one signal, regardless of how many 
messages they are trying to send to how many people. 
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Interference Receivers receive not just the signal corresponding to messages 
intended for them, but also all of the other transmitted signals as well. 
These signals are called interference. 

Superposition Receivers cannot tell which signal corresponds to which mes- 
sage, but rather receive the superposition (that is, the sum) of all such 
signals. 



Wired networks Wireless networks 





Different signal transmitted 


Same signal broadcast to all 


Transmission 


down each wire 




receivers 


Channel 


Interference-free, indepen- 


Interference from other trans- 


dent noise along each wire 


mitters and backgroimd noise 


Reception 


Different signal received from 


Superposition of aU signals 


each wire 


received 


Central 


Scheduling and routing 


Dealing with iaterference 


difficulty 


messages around the network 



In this thesis, we will mainly be looking at networks where each trans- 
mitter wishes to send a single message to a single receiver, and each receiver 
requires a single message from a single transmitter. Thus, the broadcast and 
superposition problems are less important that that of interference. 

We capture this problem mathematically by modelling the network by a 
probability transition fimction 

p{yi,...,yn I Xi,...Xn) 
relating all transmitted signals to all received signals. 

Wired network Wireless network 




Later in this chapter, we consider a number of methods for dealing with 
such interference. 

2.2 Networks 

Point-to-point links, as we discussed in the previous chapter, are fairly well 
understood. Networks, however, are much trickier. 



2.2. Networks 
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By a network, we mean a number of transmitters and receivers, all trying 
to send and receive messages through the same medium. 

Definition 2.1. A communications network consists of 

1. a set T of transmitters, each with an input alphabet Xf, 

2. a set TZ of receivers, each with an output alphabet 3^y; 

3. n probability transition function p{{yj : j & TZ) \ (x,- : / e T)) relating 
them. 

(In general, an agent is allowed to be a duplex operator, that is to be both 
a transmitter and a receiver, which can act as a relay in a network. However, 
duplex operation will not be used in this thesis, so our definition precludes 
this.) 

Again, we will be interested in the Gaussian and finite field channels with 
fading. Because there are now many transmitters and receivers, we will let 
Hji [t] denote the fading coefficient at receiver from transmitter i. 

The Gaussian and finite-field networks work in much the same way as the 
point-to-point channels, with the change that now receivers experience the 
superposition (that is, sum) of aU signals sent. 

Definition 2.2. Gaussian networks have A", = yj — C for aU i and The proba- 
bility transition measure is implied by the relationship 

where Zj[t] ^ CN(0, 1) independently across / and t. 

The finite field network of size q with noise Z has A"; = yj = for aU i and 
The probability transition measiure is implied by the relationship 

[i] = E Hji [t] Xi [t] + Zj [t] (mod q) j e TZ, 
ieT 

where Zj [t] are independently and identically distributed like Z. 

When it's convenient, we will write these networks in matrix form, that is 

\[t] = H[t]x[t]+Z[tl 

where \[t] = {Yj[t] : j E TZ) is the received vector, x[t] = -.ieT) is 

the transmitted vector, Z[t] = (Zj[t] : j e TZ) is the noise vector, and H[t] — 
{Hji[t] : i E T,i & TZ) is the channel-state matrix. 

To design a code for a network, we need to specify which transmitters are 
trying to send a message to which receivers; then each transmitter needs an 
encoding function, and each receiver a decoding fimction (or more than one, 
if they are receiving many messages). 
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Definition 2.3. A code for the network 

(T,n, {Xi : / e T), {yj : /• e n),p{{yj -.jen)] {Xi : / e T)) 
consists of 

1. a set £ C T X 7^ of L direct links (we call the other links in (T x 7^) \ £ 
the crosslinks); 

2. a message set Mij of cardinality M,y for each link i— 6 £; 

3. an encoding function x; : Ylj-.i^jeC -^ij ~^ foi" ^^^h transmitter i G T; 

4. a decoding function fhij : yJ — >• A^,y for each link i— G £. 

On link i— G £, the rate is r/y := log My /T, the rate vector is r = (r,y : i— > 
7 G £), and the sirai-rate is := Ei_>;g£ '"»;• The error probability on link f— 
is 

\ 

^ii -^YrYL^ imjiVj) 7^ m I x;(m) sent). 
^^'■ij meMij 

When dealing with the Gaussian case, the power of transmitter i is the 
maximum value of \xi\^ := 1/TELi I^/WP 

over all i's codewords x,-. 

Some common examples of networks are the following: 

Definition 2.4. The point-to-point link is just a special case of a network with 

r = {AUce}, n = {Bob}, £ = T x 71 = {AUce^Bob}. 

Alice • ► o Bob 

The multiple-access network has multiple transmitters sending to one re- 
ceiver, so 

T={l,...,n}, 7e = {Bob}, £ = T x 7e = {l-5-Bob,...,n->Bob}. 



o Bob 




The broadcast network has one transmitter sending to miiltiple receivers, so 
T = {Alice}, {l,...,n}, £ = T x 7^ = {Alice-)-l, . . ., Alice^n}. 



2.2. Networks 



51 




The interference network consists of multiple point-to-point links communi- 
cating over the same medium, so 

T^{l,...,n}, n^{l,...,n}, £ = {l^l,2^2...,n^n}. 

(Note that this differs from n independent point-to-point links, since each re- 
ceiver is also receiving the signals from the other transmitters, even though 
they have no use for that signal.) 

1 • 1 

2 • ►o 2 

3 • 3 

4 • 4 

5 • 5 

The X network consists of an equal number of transmitters and receivers 
communicating across all possible links, so 

r = {l,...,n}, 7^ = {l,...,n}, £ = r X 7^ = {1-5-1, 1-5-2, 1^-3..., n-^n}. 

o 1 
o 2 
o 3 
o 4 
o 5 

As with point-to-point Unks, we are interested in the maximum rate at 
which we can send information through a network. However, since we now 
have several competing links in the same network, no one benchmark will 
describe this. For example, achieving a high rate on one particular link may 
use up a lot of channel resources, leading to slower communication on another 
link. (Consider trying to hold a conversation in a room where lots of other 
people are shouting.) 

Instead the set of achievable rate vectors will be a region of L-space; we call 
this the capacity region. (Recall from Definition 2.3 that L — \jC\ is the number 
of links in the network.) 
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Definition 2.5. Consider a network (T, 7?-, (A',)/ (3^/)/ p((i//) | (^/)))- 

A rate vector r = : i^j e £) is achievable for links £ if for any error 

tolerance e > 0, there exists a code for C with rate on each link /— at least 

and all error probabilities lower than e. 

Otherwise, r is not achievable, in that there exists an error threshold e such 

that there exists no code for £ with rates at least and all error probabilities 

lower than e. 

Definition2.6. Consider a network (T, 7^, (A";), (3^^), | (x;))). Then we 

define the capacity region of the channel, C, to be the closure of the set of all 
achievable vectors: 

C := {r e R^^ : r is achievable}. 

In other words, all rate vectors r in the interior of C are achievable, but no 
r outside c is achievable. 

Note that the capacity region will always be convex: Suppose the rate vec- 
tors rj and r2 are both achievable. Then the rate vector Ar^ + (1 — A)r2, for 
A e [0, 1] is achievable by operating at ri for AT of the time points and at X2 
for the remaining (1 — A)T timeslots. This strategy is known as time sharing; 
we discuss this further in Subsection 2.4.1. 

Definition 2.7. We define the sum-capacity to be the maximiim achievable 
sirai-rate, so 

C£ :— maxr^ — sup{r£ : r is achievable}. 



The ciirrent knowledge of capacity regions for these networks in the Gaus- 
sian and general cases is summarised in the table below. 



Network 


General case 


Gaussian case 


Point-to-point 


known (Theorem 1.11) 


known (Theorem 1.19) 


Multiple-access 


known [21, 22] 


known (Theorem 2.8) 


Broadcast 


unknown; known for 
some special cases [23] 


known [24] 


Interference 


unknown; known for 
some special cases [25] 


imknown; known for some special 
cases [25]; sum-capacity known 
for most two-user cases [26] 


X 


unknown 


imknown 



Later, we will use the capacity region of the multiple-access network. It 
was discovered independently by Ahlswede [21] and Liao [22] in the 1970s. 
In the Gaussian case, it simplifies to the following: 
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Theorem 2.8. The capacity region of the multiple-access network of n transmitters 
with fixed fading is the set of (r^, r2, . . . , r„) G R" satisfying 



In this thesis, we will mostly be interested in the n-user interference net- 
work. We will use the word 'user' to denote a matching transmitter-receiver 
pair. Hence, an n-user network consists of n transmitters and n receivers. We 
will mostly be interested in the large n limit. 

Recent work by Jafar [27] in the fixed snr, n — > oo regime has shown much 
promise. We review Jafar's work in detail later in this chapter and in Chapter 
4, and extend it to physical models of wireless networks. 

Alternatively, in the fixed n, snr — > oo regime, Cadambe and Jafar [28] used 
interference alignment to deduce the limiting behaviour within o(log(snr)). 
These techniques were extended by the same authors [29] to more general 
models in the presence of feedback and other effects. 

For small n, the classical bounds due to Han and Kobayashi [30] as refined 
by Chong, Montani, Garg and El Gamel [31] for the two-user Gaussian inter- 
ference network have recently been extended. For example Etkin, Tse, and 
Wang [32] have produced a characterization of capacity accurate to within 
one bit. These results were extended by Bresler, Parekh, and Tse [33], using 
insights based on a deterministic channel which approximates the Gaussian 
channel with sufficient accuracy, to prove results for many-to-one and one-to- 
many Gaussian interference channels. 

A different approach towards finding the capacity of large communica- 
tions networks is given by the deterministic approach of Avestimehr, Diggavi, 
and Tse [34]. They show how capacities can be calculated up to a gap deter- 
mined by the niraiber of users n, across all values of snr. 

More generally, in problems concerning networks with a large number of 
nodes, the work of Gupta and Kumar [35] uses techniques based on Voronoi 
tesselations to establish scaling laws. (See also the survey paper of Xue and 
Kumar [36] for a review of the information theoretical techniques that can be 
applied to this problem.) 

Ozgiir, Leveque, and Tse [37] and Ozgiir and Leveque [38] use a similar 
model of dense random network placements, though using the same points 
as both transmitters and receivers. They describe a hierarchical scheme, where 




for alls C {1,2, ... ,n}, where snr, :— is the signal-to-noise ratio from trans- 
mitter i. 

The sum-capacity is 
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nodes are successively assembled into groups of increasing size, each group 
collectively acting as a multiple antenna transmitter or receiver, and restrict to 
transmissions at a common rate. They show [37, Theorems 3.1, 3.2] that for 
any e > there exists a constant k — k{e) depending on e and a fixed constant 
K such that 

fcn^-^ < C£ < Knlogn. (2.1) 

These bounds are close to stating that cz grows like n, but without the explicit 
constant that Jafar [27] and the work in Chapter 4 of this thesis achieve. (Later, 
we produce a version of the upper bound (2.1) without the logarithmic factor 
and being explicit about the constant K. Note that this result is proved under 
a model that differs from that of Ozgiir, Leveque, and Tse [37], and the fact 
that we have a total of 2n nodes rather than n - although this is unimportant 
for asymptotic results. Further, in their work, local collaboration and relaying 
are both allowed, meaning that the true rate in their scenario could indeed be 
n log n .) 



2.3 Interference as noise 



Recall that we mentioned in Section 1.4 that the optimal input distribution to 
the Gaussian channel is CN(0, P), while the noise is CN(0, ct"^). Thus, inter- 
ference has the same distribution as noise (after an appropriate scaling). So to 
receiver /, the received signal 

yj = E ^ii^i W + [t] Zj [t] ~ CN(0, 1) 
1=1 

is statistically indistingmshable from 

Yj = hjjXjit] + Zj[t] Zj[t] ~ CN 1^0, 1 + E Wjil^^ . 

In other words, treating interference as noise allows user / to communicate 
at rate 

r,=log(l + sinr,)=log(l+ ^^^^|^^.^^, ). 

Here, \hjj\^ is the received power of the signal, and J^^j \hji\'^ the received 
power of the interfering signals from other transmitters. We call 



sinr; := 



'■ l + Ei^j\hji\2 
the signal-to-interference-plus-noise ratio at receiver/. 



2.4. Decode and subtract 
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Theorem 2.9. Consider an n-user Gaussian interference network. The rates ry = 
log(l + sinry) are simultaneously achievable. 

The convex hull of the set oft with rj < log(l + slur j) for all j is an inner bound 
for the capacity region. 



Hence, in this situation, user ; can communicate at ahnost the same rate it 
could were there no interference at all. 

However, if the interference is strong, this strategy will lead to a dramatic 
decrease in the rate. In this case we wiU need different strategies. 

2.4 Decode and subtract 

The tactic of treating interference as noise suggests a method of dealing with 
strong interference. 

Suppose we have just one interfering link 2— >1 that is very strong. Then 
we can treat the interference as signal, and treat the signal itself as noise. This 
allows us to decode the interfering signal X2 (with xi as noise). Once we have 
decoded X2, we know the interference hi2X2 (since we have perfect channel 
state information), allowing us to subtract it. This forms the interference free 
signal 

Yi := Yi - h-i2X2 = {huxi + /112X2 + Zi) - hi2X2 = hiXi + Z^. 
In this case, receiver 1 requires 



to decode the interference, and then r\ < log(l + snri) to decode the signal, 
once the interference has been subtracted. 

If the 'interference-to-noise-plus-signal ratio' |fti2|'^/(|^iiP + 1) is large - 
that is, if the interference |/ii2p is strong compared to the signal then 
this will be almost as effective as if no noise were present. 

2.5 The problem of mid-level interference 

So far, we have two principles: 



When the interference is weak, that is when 
wiU be quite effective, as we will have 



J^i^j \hji\^ < 1, this strategy 




snr,-. 




Weak interference should be treated as noise. 
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Strong interference should first be decoded, and then subtracted. 

This leads to a natural question: what about mid-level interference? That 
is, what is the best way of dealing with interference when the power of the 
interference is roughly equal to the power of the signal? 

Indeed, there are many plausible real-life situations where the mid-level 
interference woiild seem to be the most likely. 

For example, in cellular networks (such as mobile phone networks), we 
have a phenomenon known as the edge-of-cell effect. This describes the phe- 
nomenon that near the edge of a cell, the strength of a signal is of a very simi- 
lar level to if it just fell out of the cell. Combating the mid-level interference of 
these edge-of-ceU transmitters is essential to maintaining a high-quality sys- 
tem. 



In fact, dealing with this mid-level interference turns out to be particularly 
important in multi-user networks; while weak and strong interference are eas- 
ily dealt with, even rare occurrences of mid-level interference can severely re- 
strict the performance of such a network. The following example, due to Jafar 
[27], illustrates this. 

Consider a two-user Gaussian interference network, as governed by the 
input-output equations 




For users near the centre of 
cells, the signal is much 
stronger than the interference. 




For users near the edge of cells, 
the signal and interference 
are of similar strengths. 



Yi[t] =hnXi[t]+hi2X2[t]+Zi[t] 

Y2[t]^h2lXi[t\+h2lX2[t]+Z2[t\. 



We wiU use a model with a fast-fading phase. So 



hii[t] = VsnrIexp(i0nM) 



hn[t\ = ViHr^exp(i0i2[t]) 



/i2i[t] = Vinr2iexp(i02i [t]) 



h22[t\ = v'sM^exp(i022M), 



where the are IID uniform on [0,2/1). 



2.5. The problem of mid-level interference 



57 



Here inry; — \hjiY' is the interference-to-noise ratio at receiver from trans- 
mitter i. 

For simplicity, we shall fix the direct links to be of equal strength: snrj — 
snr2 =: snr. 

Now suppose just one of the two interfering crosslinks is precisely this 
mid-level interference: so inr2i = snr too. 




Jafar [27, Lemma 1] then showed the following surprising result: 

Theorem 2.10. The sum-capacity of the above network is — log(l + 2snr), 
regardless of the value of inri2. 

"Regardless of the value of inri2".' This is worth emphasising: just one 
crosslink of this mid-level interference has completely determined the sum- 
capacity of the whole network. 

In Chapter 4, we shall see similar phenomena for networks with many 
more users, where the study of bottleneck links will be vitally important. 

Proof. Direct part. Achievability follows from using ergodic interference align- 
ment (Theorem 2.14), which we consider later in Section 2.6, or by timesharing 
(Section 2.6.1). 

Converse part. Suppose we have a code allowing us to achieve the sum-rate 
rs = ri + 7-2. 

Suppose a genie provides receiver 2 with transmitter I's message (which 
could only increase the capacity of the network). This allows receiver 2 to 
cancel the interference due to xi. 

By assumption, receiver 1 can decode his own message, and thus cancel 
the intended signal x\ from his received signal y\. 

This leaves the two receivers with statistically equivalent signals. There- 
fore, if receiver 2 can decode message m2 - which it can by assumption - then 
so can receiver 1. 

Since receiver 1 is able to decode both messages, the sum rate cannot be 
more than the sum rate capacity of the multiple access channel seen at receiver 
1, which by Theorem 2.8 is < log(l + 2snr). □ 

(We will use a similar proof strategy later to prove Lemma 4.11.) 
This shows that dealing well with this mid-level interference will be par- 
ticularly important. We examine ways of doing so in the next section. 
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2.6 Resource division 

When faced with mid-level interference, a simple way of dealing with it is to 
share out the channel resources between the transmitters, to stop the interfer- 
ence from getting in the way. (This technique is often known as orthogonali- 

sation.) 

This idea is best illustrated by examples, of which we give three below. 

A big advantage of resource-division strategies is that they are fairly easy 
to set up, and require neither detailed ongoing channel knowledge nor high 
computational complexity. There is also little need for cooperation between 
users after setup. 

In this section, we assume for simplicity that all snrs are equal. 



2.6.1 ... by time 

In schemes which share out the time resoiirce, each transmitter is given sole 
use of the channel for some period of time, in return for which they may not 
transmit the rest of the time. 

Consider the case of the finite-field model. At any particular time, only one 
user has control of the channel, and they can communicate up to their single- 
user capacity D(Z). All the other transmitters are silent, so the sum-rate is 
also rz = D(Z). 

Each user can communicate at a rate r = D(Z) /n, for dof = 1/n degrees 
of freedom each (recall Definition 1.13), which in particular tends to as the 
number of users n gets large. Naturally this is undesirable. 

For the Gaussian model, transmitters can take advantage of the fact that 
they will only be transmitting part of the day, yet are operating under an 
average power constraint. Hence, when they are transmitting, transmitters 
can use power nP instead. Hence, each user can communicate at a rate r — 
ilog(l + nsiir). 

Note that for low snr we have 

— = — logll + nsnr) w — n snr log e = snr log e, 
n n n 

which is the same as if there was no interference at aU. Hence, for low snr, 
these schemes are optimal. 

For high snr (the more common case), we have 

— — - log(l + nsnr) w - (log snr + log n) w - log snr, 
n n n n 

a reduction over the single-user case of a factor of 1/n again. That is, each 
user has dof = 1/n degrees of freedom each, for a total of dofj; — 1 degrees 
of freedom all together (recall Definition 1.20). 
In summary: 
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Theorem 2.11. Consider an n-user finite field interference network. Then the rates 
ri = T>(Z) I n are simultaneously achievable, for a sum-rate oj r^. — ID(Z) and 
dof z = 1. 

Consider an n-user Gaussian interference network. Then the rates ri = ilog(l + 
nsnr;) are simultaneously achievable. If all snrs are equal, this has a sum-rate of 
rz — log(l + nsnr), which has dof £ — 1. 

H0st-Madsen and Nosratinia showed that dof — 1/2 each, for a total of 
dof £ = 1 is optimal iox n — 1 users, and, more generally, showed that [39, 
Section IV] 

1 < dof £ < I for aU n > 2. (2.2) 

They further conjectured that in fact it is the lower bound that is tight, and 
that dof £ = 1 is optimal [39, Section IV]. In other words, they conjectured that 
resource division strategies were also optimal at high snr. We will later see 
that this is not the case. 

Note that this scheme, as with aU resoiurce division schemes, reqiiires a 
small amount of precoordination between users, to decide which user is al- 
loted which timeslot. We do not consider in this thesis the problem of how to 
conduct this cooperation. 

2.6.2 ... by frequency 

In schemes that share out the the frequency resource, each user is alloted a 
section of the frequency spectrum along which they may transmit, while re- 
maining qmet over aU other bandwidths. 

This leads again to an average per-user capacity of c/ra again. 

Second-generation (GSM) mobile phone networks use a mixture of re- 
source division by time and by frequency to share a channel of bandwidth 
25 MHz between up to 1000 transmitters. First the spectrum is shared, giv- 
ing 125 channels of bandwidth 200 kHz each. Then within each of these sub- 
channels, the time is divided between up to 8 transmitters in time slots of 
577 /^s at a time [5, Example 3.1] [40, Example 14.2]. 

2.6.3 ... by codeword space 

Code division multiple access (CDMA) is another way of allowing multiple 
users to use a shared channel. It works as follows. (For the purpose of simpli- 
fying this example, we shall think of a noiseless binary channel.) 

Assume each transmitter i wishes to send a symbol Xi G {O, l}. Using 
CDMA it does this over T channel uses. Each transmitter i is given a vector 
V; e {O, l}^. If that transmitter wish to send x, = 1, she instead transmits 
the vector V; over T channel uses; if she wishes to send x, = 0, she instead 
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transmits 0, the zero vector of length T. In other words, transmitter i sends 

XiVi. 

The receivers then receive the superposition y — Yli^i ^i^i- If the vectors 
were linearly independent (for which T > n is necessary but not sufficient), 
then each Xi can be recovered. 

Schemes such as this can be thought of as sharing the dimensions of the 
codeword space {0, 1}^ among the n users, and so is another form of resource 
division. 

2.7 Interference alignment 

Interference alignment is the name for a new class of schemes for dealing with 
interference based on the following idea: if transmitters plan their signalling 
correctly, interference can 'align' at each receiver, with the desired signal split 
off separately. This allows receivers to share their resoiurces just two ways - 
half for the signal, and half for all the 'aligned' interference. Thus all users can 
communicate at (roughly) the rate they could if there were just one interfering 
link. 

In particular, this means that each user can obtain 1/2 a degree of freedom, 
leading to dof — n/2 degrees of freedom overall. So in fact, the upper boimd 
(2.2) of Host-Madsen and Nosratinia [39, Section IV] turns out to be correct, 
disproving their conjecture that the lower bound dof = 1 was tight. (The 
conjecture was formally settled in the negative by Cadambe, Jafar, and Wang 
[41].) 

Like resource division, interference alignment can be performed in a num- 
ber of different ways. We show these by three toy examples, before concen- 
trating on the specific case that will be useful to us later. 

These schemes require channel state information at the transmitter (CSIT), 
in that the signal set in any time slot depends on the channel state coefficients 
at that time. 

2.7.1 ... by codeword space 

Note that CDMA (see Section 2.6.3) was, in some sense, wasteful for the in- 
terference network, since it allowed every receiver to decode every message, 
as if it were an X network. Whereas in fact, we only required each receiver to 
decode its own message. 

Interference alignment by codeword choice, due to Cadambe and Jafar 
[28], develops this idea further. Each receiver receives a superposition of aU 
the transmitters' (faded) signals, but by 'aligning' the interference, the receiver 
can work out its own signal, at less of a sacrifice than CDMA. 

Consider the following 3-user fading interference network: 
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Yi= xi+ ix2 + 1x3 + Zi 
Y2 = ixi +X2 + ixs + Z2 
Y3 = ixi + 1x2 + i:t3 + Z3 

Note for this toy example we have chosen the unusual channel state matrix 



H = 



1 i i 
i 1 i 
i i 1 



which has diagonal entries (corresponding to direct links) all equal to 1, and 
off-diagonal entries (interfering links) all equal to i. 

Suppose now that transmitters send their signal as just real numbers (tak- 
ing advantage of the power constraint to send at twice the power). The re- 
ceivers will receive their desired signal in the real subspace, but the interfer- 
ence will be aligned in the (purely) imaginary subspace. 

Thus, each user can commimicate interference-free at rate = 2 log(l + 
2snr). Since we saw earlier that this was optimal at high snr for 2 users, it 
must certainly be optimal for 3 users too. Hence, this interference network 
has a sum-capacity of 

3 

ce = -log(l -|-2snr) > log(l -|- 3snr), 

better than the r^^ = log(l -|- 3 snr) achievable by resource division. 

Cadambe and Jafar [28] managed to develop this idea, to show that it was 
possible for any values of fading parameters (provided there are no 'imex- 
pected' linear dependencies - this would be avoided almost surely if the fad- 
ing coefficient were from continuous distributions, for example). 

They showed how transmitters can construct their signals so that at each 
receiver the signal is contained in one subspace of the signal space, and all the 
interference in another disjoint subspace, with both subspaces using roughly 
half of the available dimensions. 

Specifically, they showed the following [28, Theorem 1]. (Recall the defini- 
tion of degrees of freedom from Definition 1.20.) 

Theorem 2.12. Consider a Gaussian interference network with fixed fading coeffi- 
cients hji. Then the total number of degrees of freedom is dof ^ — n/2. 
That is, for equal snrs, the sum-capacity is 

n 

C£ = — log snr + (log snr) as snr — >• 00. 
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Note that the per-user capacity is 

Ce glog(snr)+o(logsnr) 1 n 

— — = - iogsnr + o(loesnr), 

n n 2 ° " ° ' 

compared with a single-user rate of 

C — log(l + snr) = log snr + o(log snr), 

so the rate has been roughly halved. This compares well to the reduction to 
1 / n of resource division strategies, at least when n > 2. 

El Ayach, Peters, and Heath [42] have conducted experiments that show 
that this interference alignment technique can perform well in real life for n — 
3 users, with performance close to that predicted by theory. 

2.7.2 ...by time 

If a network has time delays, we can take advantage of these delays to align 
interference in the time domain. Interference alignment by time was first con- 
sidered by Grokop, Tse and Yates [43]. However, due to the computational 
complexity of such schemes and the lack of physical applicability, it has re- 
ceived little attention since. 

Specifically let Ty,- be the time delay between transmitter i and receiver 
Thus we have the model 

Yj[t]^f^hjiXi[t-rji] + Zj[t], 

i=l 

(with the convention that Xj [t] is for f < 0). 

Consider a toy example with the following time delays: 

Til =3 Ti2 = 4 Ti3 = 6 

T21 =4 T22 = 7 T23 = 2 

T31 —2 T32 = 8 T33 = 1 

Note that this has been set up so that delays on direct links are odd numbers, 
while delays on crosslinks are even numbers. 

This allows us to use the following strategy. Transmitters only send sym- 
bols at the odd numbered times, t = 1,3,5, Then at even-numbered times, 

receivers will only get their desired signal (since odd + odd — even), and at 
odd-numbered times, two lots of 'aligned' interference (odd + even — odd). 

Hence, users can communicate at half their single-user rate. 

Grokop, Tse, and Yates showed a generalisation of this, but it is quite com- 
plicated: see their paper [43, Theorem 3.1] for details. 
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2.7.3 ... by channel state 

Consider a fast fading 3-user interference network where the channel state 
matrix can take either of the following two values 
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1 




1 


1 






Note that that this toy example has been set up such that H + H' = I (mod 2). 

Nazer, Gastpar, Jafar, and Viswanath [44] discovered a method of interfer- 
ence aUgnment that can code across the two channel states to recover a single 
message. Transmitters send the same signal in both states, and receivers com- 
bine two estimates to recover the desired message. 

Nazer and coauthors named this scheme ergodic interference alignment. We 
investigate this further in the next section. 

2.7.4 . . . over the rational numbers 

Interference alignment by codeword space and by channel state both require 
a channel which changes over time (or, equivalent, across the frequency spec- 
trum), while interference aUgnment by time requires the existence of time de- 
lays. For some time, this left open the question of whether a form of interfer- 
ence alignment could be performed over a static channel without delays. 

The question was answered in the positive Motahari, Gharan, Maddah- 
Ah, and Khandani, with a scheme they call real interference alignment [45]. 

The strategy works by effectively 'vectorising' the channel. Specifically, we 
can treat the real numbers R as a vector space over the rational niraibers Q. 
Then we can treat a real signal x e IR as a vector x — Y,k ^kPk' where Ajt G Q 
and the V)^ are some basis real numbers. 

Using theorems about Diophantine rational approximations to real num- 
bers, Motahari and coauthors deduce that real interference aUgnment achieves 
n/1 total degrees of freedom for the Gaussian interference channel. 

Interested readers are referred to the original paper for further details [45]. 

2.8 Ergodic interference alignment 

It's easiest to analyse ergodic interference aUgnment by first looking at the 
finite field channel. For convenience, we will assume that the fast-fading co- 
efficients are IID uniform on Fg \ {O}. 

Recall from Theorem 1.12 that the single-user capacity of the finite field 
channel with non-zero fading is D(Z) := log(j — ]H(Z). 
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The main lemma that gets ergodic interference alignment to work is the 
following [46, Theorem 1 and Corollary 2]. It is based on this observation: 
although receiver j would normally wish to reconstruct just its own message 
my, it is, in fact, easier to reconstruct the 'pseudomessage' mj := J^-^i Hj^m,. 

Lemma 2.13. Let M. — F^. Consider the finite field interference network. Then 
each receiver j can decode the linear combination of messages my = Yli=i iiji^i «f 
rateD(Z). 

Proof. The key here is for all transmitters to use the same linear code. Let 
the generator matrix of this code be G. Write g[t] for the tth row of G, so 
Xi [t\ — g[t]m; (since m, is a column vector). Then each receiver sees signal 

!=1 

= f;Hy,(g[t]mO+Zy[t] 
!=1 

i=l 

^g[t]mj + Zj[t]. 

But this is precisely as if the single message m, was sent with the linear 
code. Since very good linear codes exist (Theorem 1.15), this can be done at 
rates up to the single-user capacity c = D(Z). □ 

The technique proceeds as follows: Match a state H with the complemen- 
tary state H' = I — H. Transmitters send the same signal (encoding the same 
message) in both states. Then after T occurrences of the first state receiver 
decodes fhj — YJi^i Hji^u at rate D(Z) and after T occurrences of the comple- 
mentary state decodes 

n n 

1=1 1=1 

also at rate D(Z). Receiver ; then computes the message estimate 

n n 

my = my + in- = ^(/ly,- + 6ji - hji)mi = ^ <5y;mi = my, 

1=1 1=1 

as desired. Since decoding the message required twice the blocklength, the 
rate is half what it would be, (log2™(^))/2T = D(Z)/2. 

(Observe that the receiver needs to perform two separate estimates, and 
cannot simply add the channel outputs together To do this would lead to 
a channel of the form Y = Xy + Z * Z , where the convolution Z * Z means 
the sum of two IID copies of Z. The overall rate in this case is D(Z * Z) < 
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D(Z)/2, unless Z is deterministic, with strict inequality unless D(Z) — 
[47]. In general, the K-fold convolution has relative entropy from the vmiform 
D(Z * • • • * Z) that usually decreases exponentially in K [47].) 

Nazer and coauthors use a typical set argument to show that sufficiently 
many channel states can be matched up in this way, showing that with high 
probability each matrix and its complement show up almost the same number 
of times. They use this to prove the following theorem [44, Lemma 3 and 
Theorem 2]: 

Theorem 2.14. For the model as outlined above, the rates r; — D(Z)/2 are simul- 
taneously achievable. 

Using a quantisation argument, they show the following [44, Theorem 3]: 

Theorem 2.15. For the fast fading Gaussian interference channel with symmetric 
fading (that is, H and — H have the same distribution), the rates r, — 2Elog(l + 
2SNR) are simultaneously achievable. 

We will use this result more in Chapter 4. 
Notes 

The section consists of a review of the existing literatiire; the mathematical 
contents is not claimed to be new. 

On networks, the textbook of Cover and Thomas [6, Chapter 15] and the 
review papers of El Gamal and Cover [48] and Kramer [49] were useful. 

The first detailed study of the point-to-point link was by Shannon [4], of 
the multiple access network was by Ahlswede [21] and Liao [22], of the broad- 
cast network was by Cover [24], and of the interference network was by Car- 
leial [50]. 

The strategies of interference as noise, decode-and-subtract, and resoiirce 
division are old and well-known, making tracking down details of their dis- 
covery difficult. The textbooks by Tse and Viswanath [5] and by Goldsmith 
[40] give good background on this material. 

The concept of interference alignment - first discovered in the alignment 
by codeword choice paradigm - is due to Cadambe and Jafar [28], and also to 
Maddah-Ali, Motahari, and Khandani [51], who independently discovered a 
similar method (published in the same issue of the same journal). 

The toy example of interference alignment by codeword choice is due to 
Jafar [52]. The example of interference alignment by time is after Grokop, Tse, 
and Yates [43]. Ergodic interference alignment was discovered by Nazer and 
coauthors [44]. 

A tutorial by Jafar [52] was useful for the material on interference align- 
ment. 
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Regular and Poisson 
random networks 

In this chapter, we look at two networks based on models of how nodes are 
positioned in space. In the regular network, nodes are positioned at regular 
spacings, as in a grid; in the Poisson random network, nodes are positioned at 
random according to a Poisson point process. 

We examine how large networks can operate using simple 'interference as 
noise' techniques. In particular, we show the important relationship between 
the attenuation a of the signals (which describes how quickly signals die off 
over distance) and the dimension d of the network. 

3.1 Model 

We use the model of a Gaussian network with slow fading based on power- 
law attenuation. That is we have a countable set of points (nodes) placed in 
d-dimensional Euclidean space R** - each node i E Z+ is positioned at the 
point T; e R"*. 

On the tth channel use, the signal received by node is 

Yj[t]^}:^hp{i,fr/hi[t]+Zj[t]. 

Here, p{i,j) — ||Tj — T,|| is the Euclidean distance between nodes i and We 
call h the fixed fading parameter. Large h corresponds to signals being much 
more powerful than noise; small h corresponds to signals being much less 
powerful than noise. To concentrate on the interference-limited regime, we 
will sometimes consider the limit /i — > oo, which is equivalent to the noiseless 
network with h — 1, 
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The Euclidean norm in will be denoted || ||, where d is the dimension 
of the network. It will be useful later to define 

"^^^ '-^ ni + d/iy 

which is the volume of the Euclidean unit ball in IR'*. 

The d — 2 case is the most commonly studied, as this obviously has real- 
world applications. The case d — 2> has applications in, for example, tall office 
biiildings; and the d = 1 is attracting more attention for 'car-to-car' protocols 
(see for example recent work by the US Department of Transportation [53]), 
where a long road can be modelled as a one-dimensional line. 

3.2 Regular networks 

In a d-dimensional regular network, (see for example Xie and Kumar [54]), 
nodes are placed at points of Z"* in rf-dimensional space R"^. In particular, 
any two nodes are a distance at least 1 from each other. 

Below we show regular networks in one and two dimensions respectively. 



• • • • • 



Each node t G Z" is a transmitter, transmitting to one of its 2d nearest 
neighbours, chosen arbitrarily. We write t — >^ r to indicate that node t trans- 
mits to node r. 

All nodes will use standard Gaussian codebooks of power P — 1, gener- 
ated independently of each other. This means that - power aside - signals are 
statistically indistinguishable from a) each other, and b) the background noise. 
We use the principle that interference should be treated as noise (as in Section 
2.3). 
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Following Gupta and Kumar [35] the interference at any node r G Z'* 

i^Y2Hr-^r"' (3-1) 

and using interference-as-noise the communication rate 

/ h 

r := log(l + sinr) = log (1 + y-^ 

is achievable for each link r ^> t. 

That is, if the interference / is finite, then every node can transmit at this 
fixed rate. This is known as linear growth for the following reason: if we have 
a sequence of sets of nodes (S„ : n G N), where S„ C Z is of cardinality n, 
then there exists an achievable rate n-tuple (r, : i G S„) such that 

E = nr = 0(n). 

The following theorem generalises the work of Xie and Kumar [54], who 
proved it for the case d — 2. 

Theorem 3.1. The d-dimensional linear network supports linear growth, provided 
the ratio of the attenuation to the dimension of the network is sufficiently large, specif- 
ically ifa>d. 

Proof. We proceed by induction on the dimension d, showing that the inter- 
ference I = I{a,d) is finite for a > d. Without loss of generality, receiver is 
receiving a message from transmitter t* :— (1, 0, . . . , 0). 
First, the base case, d — 1. The interference is 

I{a,l)=h E \tr =h(2f;^t---l], 

which is finite for a > 1, as desired. 

The inductive h5^othesis is that J (a, d — 1) is finite. 
Now the inductive step. Again, the interference is 

l{cL,d)^h E ||t|r* < ||t||-«. 

We now split Z** into the d different [d — 1) -dimensional coordinate spaces 
(where at least one coordinate is 0), and the 2^ open orthants (where aU coor- 
dinates are nonzero). This gives 

I{a,d) <dl{a,d-l)+h2'^ E ||t|r«. (3.2) 
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The first term in (3.2) is finite by the inductive hypothesis; we concentrate on 
the second term. By treating the point 1 — (1, 1, . . . , 1) separately, we have 

h2'^ E 11*11"'' = /i2'*d""/^ + /i2'* X] (3-3) 



The second term in (3.3) can be approximated by an integral, since for t G N**, 

rh rid 

Hence, 



|t|r''< / ••• / \\i\\-'Atx---Atd. 



||t|r* < /i2'^ / •••/ ||t||-Mti---dfd 

teN'*\{i} ^ ^ 

= /z/ ||t|r"dt 

Jwii\[-\,iY 

<h [ ||t|r"dt, 

iR''\B(0,l) 

where B(0, 1) is the d-dimensional unit ball centered at the origin. We now 
use a change of coordinates to p — \\t\\, s — t/p, so that dt = p'^^^ dp ds. This 
gives 

hl'^ V \\i\\-^<h[ r p-^p^-^ dp ds 



fOQ 

hdv{d) / p-(«-'')-M|0 
Jp=i 



p- 



(3.4) 



— hdv{d) 

^ j^ dvjd) 
oc — d' 

which is finite. Putting together (3.2), (3.3), and (3.4), we get 

lU,d) <lU,d-l)+ hl'^d-"'^ + h'^^ < 00. 

oc — d 

The inductive step is complete and the theorem is proven. □ 

Note that this result is the best possible. (By 'best possible', we mean that 
Theorem 3.1 is not true for a < d. We do not claim our boimds on I{ix,d) are 
as tight as possible.) If a < d, then the interference is 

oo oo oo 

f=i t=i t=i 

and the signal-to-interference-plus-noise ratio is 0. 
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It is worth noting that a simple bound for I{a, d) is 

lU,d) <h^^2'^-\d + iy.. 

DC — d 

This can be proved inductively, using v{d) < 2^ (that is, the volume of the 
unit sphere is less than that of the surrounding cube). 

3.3 Poisson random networks 

In this section, we define the Poisson random network, where nodes are dis- 
tributed like a Poisson process. We give a local result - bounds on the outage 
probability of a single transmission - and a global result - showing that linear 
growth occurs in the network with high probability. 

3.3.1 Node positioning model 

In a d-dimensional Poisson random network (studied extensively by Haenggi 
[55, 56], and Dhillon, Ganti, and Andrews [57], among others), the set of nodes 
{T, : / G Z+ } are placed in K"* as a Poisson point process of density 1 (without 
loss of generality). For simplicity, we will translate the points such that Tq is 
at the origin 0, and relabel the nodes by order of distance from the origin. So 
we have 0= ||To|| < ||Ti|| < ••• (and strict inequalities almost surely). 

The figures below shows Poisson random networks in one and two dimen- 
sions. 



We want to model the scenario of a multihop network; that is, where mes- 
sages are sent to distant nodes by being successively passed over short dis- 
tances by a number of intermediate nodes. For this model, we shall assume 
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that each node broadcasts its intended signal, whilst picking up the signal 
from its nearest neighbour. We will concentrate on the communication over 
just these short hops - the large-scale strategy has been studied by many oth- 
ers, particularly the multihop strategy of Gupta and Kumar [35] and the work 
on hierarchical cooperation by Ozgiir, Leveque, and Tse [37]. 

If node i's nearest neighbour is node we again write ; — > i. In particular, 
the nearest neighbour to node 0, at the origin, is node 1 at T^, so 1 ^0. 

All nodes will use standard Gaussian codebooks of power P = 1, gen- 
erated independently of each other. Nodes treat signals other than they are 
picking up as Gaussian noise, as discussed in Section 2.3. 

3.3.2 Outage probability 

Given the link / i and the positions of nodes i and /, the position of other 
nodes in the network is random. In particular, for any given rate r > 0, we 
cannot guarantee that / can communicate to i at rate r. This is because the 
other nodes might (with non-zero probability) crowd round i, drowning out 
the intended signal from 

Hence, we need to study the outage probability of the network (as discussed 
previously in Subsection 1.5.3). 

Definition 3.2. We define the outage probability as 

Pout(r) = P (r > log (1 + SINRyO) • 

Here (and throughout) P denotes probability over the Poisson point process. 

The signal-to-interference-plus-noise ratio at node ;', SINRy,-, has marginal 
distribution function 

independent of the link i j where the interference is 

Iji^ Y^h\\Tj-T^\\-\ 

From now on, we deal with the link 1 — )• 0, without loss of generality. We 
will suppress subscripts that are no longer necessary. 

An important special case is the high power regime /i — > oo, which is sim- 
pler to deal with mathematically and is important for studying networks that 
are interference limited rather than noise limited. Note that since 

Um SINK = lim ' 



h-^oo h^ca JZk^i^j h\\T j — Tj.|| * + 1 



IX Til 
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this is equivalent to taking h — 1 and ignoring the noise term. 

The next two theorems give upper (Theorem 3.3) and lower (Theorem 
3.4) bounds on the outage probability pout('')- The figure below shows these 
bounds for the common case d — 2, a. — 3 m the high power /i — > oo regime. 



Upper bound 



outage 
probability 

Pout(»') 




Theorem 3.3. Consider a d-dimensional Poisson random network with fixed fading 
parameter h, and suppose a > d. Then the outage probability is upper-bounded by 



In the high power /i — >• oo regime, we have 

d 
d 



+ exp -v{d) 2 



a — d 



-d/ixy 



Note that since 



1 



Di — d a/rf — 1' 

the attentuation a and dimension d only enter this theorem through their ratio. 

Proof. First, note that the outage probability can be rewritten as 

Pout(R) = P (r > log (1 + SINE)) = Fsinr(s), 

where we have defined s := 2'' — 1. So now we need only boimd Fsinr(s), the 
probability that SINK is large. 

We will bound Fsinr(s) by conditioning on the position of the nearest 
neighboiur. So 

Fsihr(s) - P(SIMR < s) 

= X /||Ti||(f)P(SNR<s| llTill =t)dt 



t At. 



(3.5) 
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It is known [58, Theorem 1] (and can be easily shown) that ||Ti||'* has an 
exponential distribution with parameter v{d), 

llTif ~Exp(i;(d)), 

giving 

/||T,l|W=di;(d)f'*-ie-^W*'. (3.6) 

We will also need to bound the probability that the interference is large, 
which we do using the conditional Markov inequality. Specifically, when 

Iht-" - 1 > 0, we have 

f(i > ht-^ - 1 1 llTill = < min I^^^^^E (J | ||Ta|| = t),l^ , 

(3.7) 

and when — 1 < 0, we take the trivial bound 1 instead. 

We concentrate on the simpler high power scenario first. Note that in the 
limit h ^ CO, we always have \ht~'^ — 1 > 0. 

For the moment, we concentrate on the first argument. We can write 

E(J| llTill =t) =]E ^ /^||T||-^ 

where V{t)is the set of points of the Poisson process outside the ball of radius 
t about the origin. Using Campbell's theorem (see for example the monograph 
of Haenggi and Ganti [55, Theorem A.2]), we then have 

E £ /!||T|r"= = / h\\vi\\-"-du 



■■ hdv{d) 



OL-d' 



-ioc-d) 



= /z^r(-'*). (3.8) 
a — d 

Substituting (3.8) and (3.6) this back into equation (3.5) gives 

Fsinr(s) < Urn r dv{dy-^e-^^''^'' mm[ ^ h'^t'^-\A dt 

h^ooJo I i/if-* - 1 oc-d J 

= /~rf.(d)f'*-ie-W*Vin|^^f''-Ml dt 
io ' ' [ It-" oc-d J 

= r dv{d)t'-'e--^'^''st^^t''-^dt 
Jo a — d 

+ / dv(d)t'^-'^e-''^'^')^ St" dt, 
Jt* 



where 



a — d ) 



-l/d 
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is the point where we cross over from one argument of the 'min' to the other. 
Calciilating these integrals using a substitution y — v{d)x'^ gives 



Fsinr(s) < 



ds 
a — d 



1-e 



-{«—d)/ds 



)■ 



This proves the theorem for the high power regime. 

We now move back to the general case. Returning to (3.7), we now have 



Jo 



llTill = f df 



mm • 



poo / s A 

+ / dv{d)t^-^e-''^^^^ At, 
Jt" 



where t** — (s/Zz)"^/" is the point at which ^ht^'^ -1=0. 

We could evaluate this integral numerically, or express it as a complicated 
sum of Gamma functions. Instead, we will show a simple boimd. 

When t < l-'^lH**, we have ft* < \, and hence 

This allows us to boimd the integral by 



r ' dv{d)t'-\-^')''s(\^i\t^Y—/^i 

Jo V n J oc — a 

+ / dv(d)t'^-^e-''^'^^* dt 



< 



ds 
oc — d 



(l+2|.M-"«r(2+?))+exp -.W(2|) 



S \ -d/oi 



as required, where the first term follows upon replacing the upper limit of the 
integral by oo. □ 



Theorem 3.4. Under the same conditions as Theorem 3.3, the outage probability is 
bounded below by 

Vout{r) > 1 



1 



Proof. The key is to observe that the interference I is at least as large as the 
contribution coming from the second-nearest neighboiir T2. That is. 



PI I > g/it^^ + l 



t] >P| J> ^ht-" 



>P(/i||T2||-*> J/ir« 



llTil 



t\. 
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Rearranging this, we get 



|Ti|| = t] =p( 11x211-" > -r* 



s 



= 1-P(||T2|| >fs^/H llTill =f) 
= 1 - exp ( - v{d){{ts^/''Y -t^)y 



where the final result follows on considering the probability that the annulus 
{ueW^ -.t < ||u|| < fs^/*} is empty. 

Again, combining this with Equations (3.5) an.d (3.6), we obtain a lower 
bound on the outage probability of the form 

IJ' dvidy-^e-^^'^'" (1 - e--('')^''(«''"-i)) dt 

= r dv{d)x'-^e-<')''dt- r dv{dY-^e-<'^*''"'dt, 
Jo Jo 

and the result follows on making the change of variables y — v{d)t'^. □ 
3.3.3 Linear growth 

Recall that we say that linear growth occurs when the sum-rate of n users scales 
linearly with n. That is, if we have a sequence of sets of nodes (S„ : n e N), 
where Sn C Z+ is of cardinality n, then there exists an achievable rate n-tuple 
{vi^j : i e Sn) such that Eies„ ^i^j = 

In particular if a 'proportion' p of links were to support a given rate r, then 
we would have 

ri^j « pnr = 0(n), 

ieS„ 

which would be sufficient to show linear growth. 

In particular, if commimication on distinct links were independent, this 
would be true with p — 1 — pout{r)- Although the links are not independent, 
links 'far enough away' are. Thus by splitting the network into 'close' and 
'distant' nodes we can prove the theorem. 

Theorem 3.5. Consider a d-dimensional Poisson random network with attenuation 

oi in the interference-limited h ^ 00 regime. Suppose a > d. Then we have linear 
growth with probability tending tolasn ^ coat rate 0{n~^^~'^^"-^). 

The following Chernoff-type bound on Poisson random variables will be 
useful later. 



Lemma 3.6. Let X ~ Po(A). Then P(X > eA) < e"'^. 
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Proof. We have, using Markov's inequality, 

IP pX p(e-l)A 

P(X > eA) = F(e^ > e^^) < ^ - = e"^. □ 

We are now in the position to prove Theorem 3.5. 

Proof of Theorem 3.5. First recall that /i — > oo is eqmvalent to taking /i = 1 an 
deleteing the noise term. 

Let n be some fixed integer. (Later, we will consider the sum rate of n 
nodes, and let n — )• oo.) Fix the nearest-neighbour link / — >/. 

Given a node / situated at the point Ty, we divide the other nodes into 
those close to 

CO') :^{ky^i,j:\\Tj-T,\\<n''} 
and those distant from 

D(/) := {k^i:\\Tj-T4>n'}, 

where a is some parameter to be chosen later. 

Outage occurs when the signal-to-interference-plus-noise ratio is insuffi- 
cient to support some given rate r — log (1-|- s). We will consider whether 
an outage event is caused primarily by close or distant nodes. Specifically, we 
define the events 

{llT- — T-ll~* s 1 

r 'llT T\\-^ -7(' 

{ll-p _ 11 — a 
LkeDii) l|T;-Tfc|| 2 J 

Note that if a link i — > is in outage, at least one of Outc{j) and OutD(;) 
must be occurring. Note also that the (marginal) distributions of Outc (;') and 
Outo (; ) are independent of the node When the node index is irrelevant, we 
suppress it. 

Suppose n nodes aU try to communicate at the rate r — log (1 -I- s). Then 
fore e (0, j), 

P (total rate < (1 - 2e)nr) 

< P (number of outages > 2en) 

— P (number of distant outages > enU number of close outages > en) 

< F (number of distant outages > en) + F (number of close outages > en) 

= P 1^ £ l[OutD (;•)] >en^+F (j^^ l[Outc (/)] > en^ . (3.9) 
We bound the two terms above separately. 
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For the first term in (3.9), by Markov's inequality, we have 

A-i \ En-^l[OutD(;)] 1 

P l[OutD(/)] >en)< — = -P(OutD). 

\;=o J en e 

Using the same ideas as in the proof of Theorem 3.3, we can show that this 
term tends to zero for a > at rate 0(n~''("~'*)). (The key is to change the 
lower limit in the integral in (3.8) to n".) 

We bound the second term in (3.9) using the idea that for most ; and k, C (j) 
and C{k) are disjoint, so the corresponding contributions will be independent. 

Let Nj denote the number of nodes whose 'close' regions overlap with j's 
'close' region; that is 

Nj:^#{k^i:C{i)UC{k)^0}. 

Note that Nj is the niraiber of nodes in a ball of radius In", so is Poisson with 

mean 2'^v{d)n'"^. 

We write J\f = {maxo<y<„_i Nj > 2'^v{d)n'"^e} for the event that one of the 
Nj is particularly large. We will argue that the event Af can be ruled out with 
high probability, allowing good control of the growth of the variance. 

We exploit the fact that, conditioned on the event JV^, for any ; there are at 
most 1 + 2'^v{d)n"'^e indices k such that Cov (l[Outc(7)]/l[Outc(A:)] | A/"") is 
non-zero, and each such covariance is no greater than 1. Hence we can control 
the growth of the variance of the sum as 



Var( X:i[Outc(;)] 

V;=o 



\ n-l 

A/"" < Y^{l+2'^v{d)n''^e)Yai{l[Outc{i)]\Af') 
J M 

< {l + 2'^v{d)n'"^e)n. (3.10) 



By the imion boimd and Lemma 3.6, we have 



n-l 



^(J^) < E ^ ^'^vidX^) < ne-^ <^> . (3.11) 

;=0 

Using the law of total probability, and substituting (3.10) and (3.11) into 
Chebyshev's inequality gives 



/n-l \ /n-l 

P E l[Outc(;)] > en = P(^')P E l[Outc(;)] > en 

V;=o / Vy=o 



/ n-l 

+ P(.A/'')P E l[Outc(;)] > 
V;=o 

< ne-2'^('*)""' + P ( '£\[Outc (;■)] > en N' 

\j=0 
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< ne 



n2(e-E(l[Outc] \^f')f 



n2(e-P(Outc I A/''^))^' 

As we send n — )• oo, this tends to at rate 0{n'"^~^). 

Setting a — 1/a means the two terms both tend to at rate 

ad — 1 — —a{a — d) = — ^1 — , 

as desired. 

□ 



3.4 Further work 

The strength of a Poisson model is the extensibility and mathematical flexibil- 
ity of the Poisson point process. 

For just one example, in a spatially inhomogeneous Poisson process, there 
is a density function A: IR*^ ^ ]R+, such that the number of nodes in some 
region A is Po ( A(x) dx) . Under what conditions on A is linear growth still 
possible? 

Many other open questions remain. 



Notes 

The new work in this chapter is joint work with Oliver Johnson and Robert 
Piechocki. It has not previously been published, although the research was 
conducted prior to some since-published work [59, 55] 

The monographs of Baccelli and Blaszczyszyn [59] and Haenggi and Ganti 
[55] were useful background on the SINR approach to stochastic networks, as 
was the work of Haenggi [56], Xie and Kumar [54], and Gupta and Kiraiar 
[35]. 

The textbook of Kingman [60] was useful background material for Poisson 
processes. 
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Sum-capacity of 
random dense Gaussian 
interference networks 

In this chapter, we will find approximations to the sum-capacity of interfer- 
ence networks with many users. Our interference networks will be motivated 
by physical models of wireless networks. 

When we say many users, we will be allowing the niraiber of users n to 
tend to infinity, and looking at asymptotic behaviour. We will be doing this 
without expanding the area in which the nodes reside, so the nodes will get 
packed closer and closer together - for this reason, such networks are called 
dense networks. 

We will model transmitters and receivers as being placed randomly in 

space. This means that signal- and interference-to-noise ratios will be ran- 
dom too, but fixed for the duration of communication - and hence a form of 
slow fading. 

The crucial insight to prove these results is that the performance in these 
networks is tightly constrained by the performance on a few so-called bot- 
tleneck links. Ideas from interference alignment (specifically ergodic interfer- 
ence alignment) are crucial in performing well on these links, and hence in the 
whole network. 

The main result (Theorem 4.3) wiU be to show, for the model we wiU con- 
sider, that the sum-capacity is roughly 2Elog(l + 2SNR). Specifically, we 
show that C^/n converges in probability to log(l + 2SNR) as n — > oo. 

In this chapter, random positions will give a form of slow fading, and we 
will also add fast fading, to make a realistic model of real-world networks. 
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All results in this chapter hold provided that this fast fading is symmetrical 
(in the sense that Hy, and — Hy, are identically distributed, for all i and ;'). 

For simplicity and concreteness, we will choose to give all our results in 
the context of random phase fading, so 



In Section 4.6, we briefly discuss how to apply other fading models such as 
Rayleigh fading. 

4.1 Introduction 

Recently, progress has been made on many -user approximations to the sum- 
capacity Ce of random Gaussian interference networks. 

In particular, in a 2009 paper, Jafar [27, Theorem 5] proved a result on the 
asymptotic sum-capacity of a particular random Gaussian interference net- 
work: 

Theorem 4.1. Suppose direct SNRs are fixed and identical, so SNR; — sm: for all 

i, and suppose that all IMRs are IID random and supported on some neighbourhood 
of snr. Then the average per-user capacity Cs/ n tends in probability to \ log(l -|- 
2snr) fls n — > 00. 

We examine Jafar 's result in detail later. 

(Here and elsewhere, we use to denote the sum-capacity of the net- 
work, and interpret C^/m as the average per-user capacity.) 

A subsequent result by Johnson, Aldridge and Piechocki [1, Theorem 4.1] 
concerned a more physically realistic model, the standard dense network: 

Theorem 4.2. Suppose receivers and transmitters are placed IID uniformly at ran- 
dom on the unit square [0, 1]-^, and suppose that signal power attenuates like a poly- 
nomial in 1 / distance. Then the average per-user capacity Cx,/n tends in probability 
to l]Elog(l -I-2SNR) as n -> 00. 

In this chapter, we prove a similar, but more general, result to Theorem 4.2, 
with a neater proof, using ideas from Jafar 's proof of Theorem 4.1. We assume 
transmitters and receivers are situated independenfly at random in space (not 
necessarily uniformly), and that the power of a signal depends in a natural 
way on the distance it travels. 

Specifically our result is the following (full definitions of non-italicised 
technical terms are in Section 4.2): 




for i — j, 
for i j. 
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Theorem 4.3. Consider a Gaussian interference network formed by n pairs of nodes 
placed in a spatially separated IID network with power-law attenuation. Then 
the average per-user capacity C^^ln converges in probability to 2Elog(l + 2SNR), 
in that for alle > 



The direct part of the proof uses interference alignment; specifically, we 
take advantage of ergodic interference alignment (see Section 2.6). 

The converse part of the proof uses the idea of 'bottleneck links' developed 
by Jafar [27]. An information theoretic argument gives a capacity boimd on 
such bottleneck links, and probabilistic counting arguments show there are 
sufficiently many such links to tightly bound the sum-capacity of the whole 
network. 

Before going any further, we should mention two similar results in the 
same area using different interference alignment techniques. 

A paper by Ozgiir and Tse [61] proves linear scaling in interference net- 
works by showing that for any value of snr and n, the sum-rate is bounded 
below by kin log(l -|- A:2snr), where ki,k2 > are universal constants. 

Secondly, a paper by Niesen [62] bounds the capacity region (rather than 
just the sum-capacity) of arbitrary dense networks, albeit with a factor of 
O(logn) separating the inner and outer bounds. 

4.2 IVIodel 

We outline the model we will use. We will model separately how messages 
are transmitted, and how nodes are positioned. 

These ideas were introduced in an earlier paper [1], but were not fully 
exploited, due to that paper's concentration on the standard dense network. 

4.2.1 Communication model 

We will use the n-user Gaussian interference network as our main model. That 
is, we have 



The fading coefficients Hy,- [t] will be made up of a fast fading part, rep- 
resenting the moment-to-moment changes in the channel, and a slow fading 
part, the power attenuation due to node placing. 

The results require the fast fading to be symmetric, in that Hp and — Hy,- 
are identically distributed - this is ensured by the random phase we use (and 




n 
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is also satisfied by Rayleigh fading). For ease of notation and to simplify ex- 
position, will assume the fast fading takes the form of a (uniformly) random 
phase change (although we briefly discuss more general models in Section 
4.6). Therefore, we can write the fading coefficients in modulus-argument 
form as 

Hii = exp {i@ii [t] ) VSNRi, Hji = exp (i0y; [t] ) ^ INRy; / ^ i, 

where the &ji[t] are IID uniform on [0,2n), and SNR, = |H,;p and IMR^,- — 
\Hji\^ are the squared moduli (which are constant over time). 

Our results are in the context of so-called line of sight' communication 
models, without multipath interference. That is, we consider a model where 
signal strengths attenuate deterministically with distance according to some 
function a. 

Definition 4.4. Fix transmitter node positions {Ti, . . . , T„} G R'* and receiver 
node positions {Ri, . . . , R„} e IR'', and consider Euclidean distance || || and 

an attenuation function a: 1R+ — )• ]R+. 

We define SNR, = a(||R/ - T;||), and for all pairs with / j, define INRj, = 
«(I|R;-T/||). 

We consider the n-user Gaussian interference network. So transmitter i 
sends a message encoded as a codeword x; — (x,[l], . ..,x;[T]) to receiver 
i, under a power constraint ^ YLt=i ^ 1 for each i. The tth sjnnbol 

received at receiver is given as 

Yj[t] = exp{i&jj[t])^SmjXj[t] +J2eM^®ji[i])^/^Mt] + Zj[t], 

where the noise terms Zj [t] are independent standard complex Gaussian ran- 
dom variables, and the phases &ji[t] are independent li[0,27r) random vari- 
ables independent of all other terms. The INRy, and SNR; remain fixed over 
time, since the node positions themselves are fixed, but the phases are fast- 
fading, in that they are renewed for each t. 

Definition 4.5. We say an attenuation function a has power-law attenuation if 
there exist constants a and k^tt such that for all p, we have a{p) < k^ttp"- 

In particular, standard power-law decay of the form a(p) = hp^"'^^ clearly 
satisfies this with fcatt = h and a set to a/2. Other models we discussed in 
Section 1.4 such as a{p) — /imax{l,p~"/^} and a{p) — h{p + jOo)~"^^ also 
satisfy this. 

For brevity, we write Sji for the random variables ^ log(l + 2lNRy, ) (when 
i 7^ j), and S,-; for ^ log(l + 2SNR,) which are functions of the distance between 
the transmitters and receivers. In particular, since the nodes are positioned 
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independently, under this model the random variables Sji are identically dis- 
tributed, and Sji and S/j- are IID when {i,]} and {k, 1} are disjoint. 

We will also write E = ESu = jE log(l + 2SNR), noting that this is inde- 
pendent of i. (It is also true that E — ES^, for all i and ;'.) Lemma 4.12 later 
ensures that E is indeed finite. 

4.2.2 Node position model 

We believe that our techniques should work in a variety of models for the 
node positions. We outline one very natural scenario here. 

Definition 4.6. Consider two probability distributions and Pr defined on 
d-dimensional space M''. Given an integer n, we sample n transmitter node 
positions Ti, . . . , T„ independently from the distribution ¥j. Similarly, we 
sample n receiver node positions Ri, . . . , Rn independently from distribution 
Pr. We refer to such a model of node placement as an IID network. 

(Eqmvalently, we could state that transmitter and receiver positions are 
distributed like two independent non-homogeneous Poisson processes, con- 
ditioned such that there are n points of each type.) 

We pair the transmitter and receiver nodes up so that transmitter / at T, 
wishes to commimicate with receiver i at R, for each i. 




Transmitters and receivers that are very close to each other will lead to 
very strong interference or signals. These extreme occurences could prevent 
the network from operating as we would like. Also, as we remarked in Section 
1.4, our attenuation models lose physical accuracy at very small distances. For 
this reason, we will demand that our node positioning model ensures that this 
is rare, a property we call spatial separation. 

Definition 4.7. Let T ~ P7 and R ~ Pr be placed independently in R"*. We 
say the IID network is spatially separated if there exists constants ^ > and fcgep 
such that for all p 

P(||R-T|| <p) <fcsep/. 
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In particular, we can show that the standard dense network is spatially 
separated. 

Definition 4.8. The d-dimensional standard dense network is an IID network de- 
fined by Fx and Pr being independent iiniform measures on [0, Vf. 

1 



R, 





o 


• 






o 

R2 


• 


o 


n O 




o 

Ti 


• 
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o 






• 


o 



Lemma 4.9. The standard dense network is spatially separated. 

Proof. We need to show that Definition 4.7 is fulfilled. By conditioning on 
T = t, we get 

P(||R-T||<p) 



'[0, 

v{d)p\ 



dt 




where v{d) is the volume of the d-dimensional imitbaU B(0, 1). Taking fcsep — 
v{d), j6 = d gives the result. □ 

The standard dense network has been the subject of much research (see 
the review paper of Xue and Kumar [36] and references therein). However, 
we emphasise that our resiilt holds for a wider range of models. 



4.3 Jafar network 

We now review in more detail Jafar's important result. We call a network with 
fixed equal snrs and IID INRs a Jafar network. (Note that the Jafar network 
cannot be written as an IID network with power-law attenuation.) 



4.3. Jafarnetwork 



87 



Theorem 4.1 restated. Suppose direct SNRs are fixed and identical, so SNR; — snr 
for all i, and suppose that all INRs are IID random and supported on some neigh- 
bourhood of snr. Then the average per-user capacity Cz/n tends in probability to 
2 log(l + 2snr) as n —>• oo. 

Proving the direct part of this result is simple: the central result of ergodic 
interference alignment (Theorem 2.15) tells us that the rates j log(l + 2snr) 
are simultaneously achievable by all users. 

For the converse part, Jafar defines [27, proof of Theorem 5] the crosslink 
z— as being a e-bottleneck link if the sum-capacity of the two-user network 
with links i— >i and j^j with crosslink i-^j has sum-rate bounded by 

Ti + Tj < log(l + 2snr) -|- e. (4.1) 

for some fixed e > 0. Analysis of these bottleneck links then gives the con- 
verse result. 

We will use a similar method to this to prove our main result (Theorem 
4.3). We will need to alter the definition of bottleneck links slightly to fit our 
needs. Also, the entire converse part is made more complicated: while the 
Jafar network has lots of convenient independences between the snr and INRs, 
we are not so lucky. Therefore extra care must be taken. 

We also give here an alternative proof of Theorem 4.1 . This proof uses tech- 
niques from graph theory, and gives a faster rate of convergence than Jafar 's 
own proof - exponential, rather than 0{n~^). 

Alternative proof of Theorem 4.1. We need to show that every user is involved in 
a bottleneck link. We wiU first review some facts from random graph theory. 

Form a random bipartite graph by taking two sets V, W of vertices each 
of size K, and making each edge from V to W present independently with 
probability d (there are no edges within either V or W). A matching is a set of 
k of the edges such that every vertex in the graph is adjacent to one edge - so 
each vertex u e V is matched to a vertex w e W by an edge vw. 



Bipartite graph Matching 
V W V W 




A classical result due to Erdos and Renyi [63, Theorem 2] (originally stated 
in terms of random matrices) states that the probability that a matching fails 
to exist tends to for any S = S{K) = (log K + co{l))/K, where a;(l) is, using 
Bachmann-Landau asymptotic notation, a term that tends to oo. 
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We recall the argument where ^ is a fixed constant, so that we can be pre- 
cise about the bounds, rather than just working asymptotically. Following 
Walkup [64, Definition 1], we say that a subset V C V of size k and a subset 
W C W of size K — k + 1 form a blocking pair of size k if no edge of the graph 
connects V to W. Walkup [64, Section 3] uses Konig's theorem (eqmvalently 
one can use Hall's marriage theorem) to deduce that 

K 

P (no matching from V to W) < ^ ^ P ( ( V, W) a blocking pair) 

fc=l |V|=)c 
\W\=K-k+l 



k=l 



We split this sirai into the terms where k < y/K and those where k > y/K. 
Using the bounds 



exp f-S for k<VK, 

exp -S — — for > y/K, 



we get the bormd 

P(no matching) < iVKK^^^^exp ("^^y^) +22^exp ^-^^j • 

We deduce that the probability of a complete matching failing to exist decays 
at an exponential rate in K. 

We can now prove Theorem 4.1. 

Construct a random bipartite graph by dividing the receiver-transmitter 
links into two groups V and W of size K = n/2. Choose e > and, for i e V 
and i e W include the edge if either of the crosslinks / — )•/ or / — )•/ is an 
e-bottleneck link, which in the Jafar network occurs independently with some 
probability S. 

We seek a matching on this graph. For each pair [i,]) that is success- 
fully matched up, the corresponding two-user channel is an e-bottleneck, and 
hence has the bound r; -|- rj < log(l -|- 2siir) -|- e by (4.1). for any achievable 
rates. If every edge is matched this way, we have 

" n 
rz = Y^ri= ri + r, < - (log(l + 2snr) -|- e) 

1=1 matches (/,;) 



4.4. Proof: achievability 
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The high probability of a matching existing implies exponential decay of 

— log(l + 2snr) > e ) ^ 
n 2 J 

proving the theorem with exponential convergence. □ 

For our IID networks, extra dependencies between links make the picture 
much more difficult, so we have been unable to find a proof that extends this 
random bipartite graph method. Whether or not exponential convergence 
holds for IID networks with power-law attenuation is an open problem. 

4.4 Proof: achievability 

We can now prove our main theorem. Theorem 4.3, by breaking the probabil- 
ity into two terms, which we deal with separately. So 



P 



>e] = F(— -E<-e)+P(^-E>e). (4.2) 



n 



Bounding the first term of (4.2) corresponds to the achievability part of the 
proof. Boimding the second term of (4.2) corresponds to the converse part, 
and represents ouj major contribution. 

We prove the direct part using ergodic interference alignment. 

Proof. The first term of (4.2) can be bounded relatively simply, using ergodic 
interference alignment. A theorem of Nazer, Gastpar, Jafar, and Vishwanath 
(Theorem 2.15 of this thesis) implies that the rates 

= ^log(l+2SNR,) =S,-i 

are simultaneously achievable. 

This implies that Cz > Ri — S,,. This allows us to bound the 
first term in (4.2) as 

F(^-E<-e) <f(^^^ <E-e 
\ n J \ n 

But E — ES,;, so this probability tends to by the weak law of large nirmbers. 

□ 

4.5 Proof: converse 

We now need to show that the second term of (4.2) tends to too. Specifically, 
we must prove the following: for aU e > 

P( — >E + e)-5'0 as n -5- 00. (4.3) 
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The proof of the converse part is the major new part of this chapter. First, 
bottleneck links are introduced, and we prove a tight information-theoretic 
bound on the capacity of such links. Second, a probabilistic counting argu- 
ment ensures there are (with high probability) sufficiently many bottleneck 
links to boimd the sum-capacity of the entire network. 



4.5.1 Bottleneck links 

The important concept is that of the bottleneck link, an idea first used by Jafar 
[27] and later adapted by Johnson, Aldridge, and Piechocki [1] in the follow- 
ing form: 

Definition 4.10. We say the link i— i 7^ is an e-bottleneck link, if the follow- 
ing three conditions hold: 

Bl Sii <E + ell, 

B2 Sy; < E + ell, 

B3 Syy < Su]- 

We let be the indicator function that the crosslink is a e-bottleneck 
link. We also define the bottleneck probability /3 := E By to be the probability 
that a given link is an e-bottleneck which is independent of i and for an IID 
network. (We suppress the e dependence for simplicity.) 

This definition does have a physical in- 
terpretation (although the only motivation f 

/ T 

for it is that it allows us to prove the boimd / ^ i 

in Lemma 4.11). "^Iq j 

The physical interpretation is this: fix / ^. 

the position T, of transmitter i. Condition / "^i^ 

Bl reqiiires that receiver i is sufficiently far / \ 
away from T,, and condition B2 reqiiires \ '^i J 
that receiver ; is sufficiently far away too. / 

Condition B3 requires that transmitter j is ^ - ^ 

closer to receiver / than to receiver /. 

The crucial point about bottleneck links is the constraints they place on 
achievable rates in a network. 



Lemma 4.11. Consider a crosslink i^j in a n-user Gaussian interference network. 
If i^j is a e-bottleneck link, then the sum of their achievable rates is bounded by 
n + Vj < 2E + e. 



4.5. Proof: converse 
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Proof. First, note that we make things no worse by considering the two-user 
subnetwork: 

Yi = exp(i0;;)^/SNRiXi + exp(i0y)^INRyX; + Z,- 

where receiver i needs to determine message m,-, and receiver ; message nij. 
(The time index is omitted for clarity.) 

From bottleneck conditions Bl and B2 we have 

1 + 2SNR,- < exp(2E + e), 1 + 2lNRi^- < exp(2E + e). 

Summing and taking logs gives 

log(l + SNR, + INRy) <2E + e. (4.4) 

We combine this with the argument given by Jafar [27], which we dis- 
cussed earlier (Section 2.3). Let r; and rj be jointly achievable rates for the 
subnetwork. In particular, receiver i can determine message with an arbi- 
trarily low probability of error. 

We certainly do no worse if a genie presents message m; to receiver so 
we assume receiver can indeed recover m,. But condition B3 ensures that it is 
easier for receiver / to determine my than it is for receiver (since the weighting 
is larger in the first case). So since receiver can recover mj (as rj is achievable), 
receiver / can recover ntj also. 

Because receiver i can determine both m,- and mj, these two signals must 
have been transmitted at a sum-rate no higher than the sirai-capacity of the 
Gaussian multiple-access channel (Theorem 2.8). Hence, 

r,- + rj < log(l + SNR; + imij) <2E + e, 

where the second inequality comes from (4.4). □ 

4.5.2 Three technical lemmas 

A few technical lemmas are required in order to prove (4.3). 

First, we need to ensure that very high SNRs are very rare (Lemma 4.12). 
Second, we need to show that bottleneck links will actually occur (Lemma 
4.13). Last, we must show that the number of bottleneck links cannot vary too 
much (Lemma 4.14). 

Under any network model where these three lemmas are true, our theorem 
will hold. We emphasise that our model of IID networks with power-law 
attenuation is one such model; we believe the result holds more widely. 
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Lemma 4.12. Consider a spatially separated IID network, with power-law attenua- 
tion. Then for any rj > 0, 



P( max Sii > n'J^A ^ Ofn"^] 
\l<i<n ) 



fls n — >• 00. 



In fact, in our case convergence to is considerably quicker than 0{n 
but this is sufficient. 

Vroof. First, we have by the iinion bound 

P(maxS;,- > n'?/2) < nP(Sii > n'?/^). (4.5) 

Now we apply the definition of S\\ := \ log(l + 2SNR) and recall that 
SNRi = fl(||Ri - Till) (Definition 4.4) to get 



P(Sn > n'?/^) ^ p (^sNRi > ^(22""'' - 1)^ 



= p(«(||Ri-Ti||)>l(22«'"''-l) 
Since « is a power-law attenuation function, we have 

P(Su > n^l^) < P (fcattllRi - Till"'^ > - 1; 

^P(||R.-T.||<(^,2--1,)-"'); 
and since the network is spatially separated, we have 

P(Sn > n^'^) < fcsep (2^(2'"''' - 1)) - 0{n-^) 

(and obviously much tighter than 0{n^^)). Together with (4.5), this gives the 
resijlt. □ 

(It is worth noting that this fast decay in the tails of S;, ensures that the 
expectation E — ES,, does indeed exist and is finite.) 

We will often condition off this event; that is, condition on the comple- 
mentary event {maxS„ < n^^'^}. We use P^, E„ and Var„ to denote such 
conditionality, and write /S„ = E„By for the conditional bottleneck probabil- 
ity. 

The next two lemmas concern showing that conditional probabilities are 
nonzero. However, we have for any event A, 

P(A) = P(A I max S,-i < n'?''2)p(maxS,-i < n'?/^) 

-|-P(A I maxS,-; > n'?/^)P(maxS,-; > n^^"^). 



4.5. Proof: converse 
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and hence by Lemma 4.12 we have the two boimds 

P(A) < P(A I maxS;,- < nl^^) + P(maxSi,- > n'?/^) = P„(A) + 0(n-^) 
P(A) > P(A I maxS;,- < n'?/2)P(maxSi,- < nl^^) = P„(A)(l - 0(n-^)), 

and so 

P(A) =P„(A)+0(n-i). (4.6) 
This will be useful in the next two proofs. 

Lemma 4.13. Consider a spatially separated IID network, with power-law attenua- 
tion. Then the conditional bottleneck probability is bounded away from Ofor all n 
sufficiently large. 

Proof. First note that by (4.6), we need only show that the unconditional bot- 
tleneck probability jS is nonzero. 

Second, note that by the exchangeability of R, and R^, we have 

P(B1 and B2 and B3) > ^P(B1 and B2). 

It is left to show that P(B1 and B2) is non-zero. 

Note that Bl requires S;,- to be less than its expectation plus e. So R, must be 
situated such that this has nonzero probability. So T, has a nonzero probability 
of being positioned such that Bl occurs. But T, and Ty are also exchangeable, 
so we are done. □ 



Lemma 4.14. Consider a spatially separated IID network with power-law attenua- 
tion. Then, conditional on {max, S,-; < n^^'^}, 

Var„(# bottleneck links) = Var„ ^E^yj = O(n^), 

where the sum is over all crosslink pairs [i,]), i 7^ 

In general, one might assume that Var„(# bottleneck links) would be pro- 
portional to the square of the total number of links, and thus be O(n^). How- 
ever, because of the independences in the IID network, the variance is in fact 
much lower. 

Proof. First consider the unconditional version. We have 

Var (E%) =EECov(By,Bfc,). 

J i^jk^i 

The important observation is that for i,j,k,l all distinct, B,y and Bj.; are in- 
dependent giving Cov(B;y, Bj-;) = 0. (This is because they depend only on 
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the position of distinct and independently-positioned nodes.) Hence there are 
only O(n^) non-zero terms in the sum. Each non-zero covariance term in the 
sum is bounded by variance of the indicator function, so 

Cov{Bij,Bu) < VarBy = ^{1 - iS) < ^. 

But by (4.6), if Cov(By, Bj^i) — 0, then the conditional covariance is very 
small Cov„ (By, Bfc;) = 0{n~^). Hence, 

Var„ l^gByj < 0{n')^-+0{n^)0{n-') = 0{n% 

as desired. □ 



4.5.3 Completing the proof 

We are now in a position to prove (4.3), and hence prove Theorem 4.3. 
Proof. We need to show 

Ve>0 V<5>0 3N Vn > N F > E + < <5. 

So choose e > 0, (5 > 0, fix n > N (where N will be determined later), and 
pick a fixed rate vector r G R" with sum-rate 

— > £ + e; (4.7) 
n 

we need to show that P(r is achievable) < S. (Here, we are writing := 
E"=i for the sum-rate.) 

We divide into two cases: when there is a very high SNR, which is unlikely 
to happen; and when there is not, in which case r is imlikely to be achievable. 
Formally, 

P(r achievable) 

= P(r achievable | maxS;; < n1^^)F{maxSii < n^^^) 

+ P(r achievable | maxS,-; > n''/^)P(maxSii > n''^^) 

< P(r achievable | maxS,-; < n"?/^) + P(maxS;,- > n"?/^) 

< P„ (r achievable) + -, (4.8) 

for n sufficiently large, by Lemma 4.12. We need to bound the first term in 
(4.8). 

First, note that our assiraiption on max, S,-; means that if r,- > 2n^^^, then 
we break the single-user capacity boimd, since we woiild have 

Ti > Inl^^ > 2maxS,y > 2S;,- = log(l -|- 2SNR,) > log(l + SNR,) 



4.5. Proof: converse 
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meaning r is not achievable, and we are done. Thus we assume this does not 
hold; that 

r, < 2m'? /2 for all i. (4.9) 

(The rest of our argument closely follows Jafar [27, proof of Theorem 5].) 

Now, if r is achievable, it must at least satisfy the constraints on the e- 
bottleneck links i— from Lemma 4.11, and hence also the sum of those con- 
straints. So 

P„ (r achievable) < P„ (r,- + rj < 2E + e on bottleneck links z— >;') 

= p„(u<y), (4.10) 

where we have defined 

1 



1 



The conditional expectations of U and V are 

E„U = 2^„^, E„y = ^„(2E + e) =2/3„ (£ + 1). 
Note that since j6„ > by Lemma 6, we can rewrite (4.7) as 

EnU>E„V + jS„e, 

or eqmvalently. 

The proof is completed by formalising the following idea: since the expecta- 
tions are ordered E„ U > E„ V, we can only rarely have the opposite ordering 
U <V. Hence the expression in (4.10) is small. 

Formally, by (the conditional version of) Chebyshev's inequality and the 
union bound, we have 



F„{U<V)<Fn(u<EnU-^OTV>EnV- 



2 



<F„{\U-E„U\> ^ ) -hP„f |v-E„y| > ^"^ 



2 I I At 1 I ' ' I _ 2 



2 .. / 2-^ 



4 

We 



2 2(Var„LI + Var„y). (4.11) 
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Using Lemma 4.14 we can boimd these variances as 



n^{n — 


1)2 


1 




n2(n — 


1)2 






1 




— 


1)2 



< 



^ ^ n2(n-l)2 ^("')^^^ + ^ 
where we used (4.9) to bound Var„ U. Choosing rj to be less than 1, we can 
ensiure N is sufficiently large that for all n > N, 

Var„U + Var„y < 

o 

This makes (4.11) into F„{U < V) < 5/2. Together with (4.10) and (4.8), this 
yields the resiilt. □ 



4.6 Conclusion 

In this chapter we have defined IID interference networks with power-law 
attenuation. We have shown that this setup fulfills necessary properties for the 
average per-user capacity Cz/n to tend in probability to 2Elog(l +2SNR). We 
have also noted that this result is not unique to our setup. We briefly mention 
one more example. 

Suppose, for example, that Rayleigh fading is added to our model. That is, 
now let SNR; : = | H,, 1 2a (| | T; - R,- 1 1 ) and INRy; : = | Hji 1 2a (| | T; - R^- 1 1 ), where the 
Hji are IID standard complex Gaussian random variables. 

Because ergodic interference alignment still works with Rayleigh fading 
[44, Section IV], the direct part of the theorem stiU holds. But also, because the 
fading coefficients are IID, the independence structure from the non-fading 
case remains, ensuring Lemmas 4.12-4.14 hold. Hence, the theorem is stiU 
true. 

Characterising all networks for which such a limit for average per-user 
capacity exists is an open problem. 

At the moment. Theorem 4.3 should perhaps be regarded as being of the- 
oretical interest. That is, our major contribution is to provide a sharp up- 
per bound on the performance of interference networks. However, the lower 
bound relies on an ergodic interference alignment which, while rigorously 
proved, may not be feasible to implement in practice for large niraiber of 
users. Examination of the proof of the effectiveness of ergodic interference 
alignment [44, Theorem 1] shows that, even for a model with alphabet size q, 
the channel needs to be used 0{{q — 1)" ) times. Even for n ~ 10, this is a 
prohibitive requirement. We approach this problem in the next chapter. 



4.6. Conclusion 
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Notes 

The new work in this chapter - specifically Theorem 4.3 and its proof, and the 
definition of the IID network - is joint work with Oliver Johnson and Robert 

Piechocki. This chapter is based on two papers we wrote [1, 2]. This work 
benefited from the advice of anonymous reviewers for from IEEE Transactions 
on Information Theory and the 2010 IEEE International Symposium on Infor- 
mation Theory. 

The ideas in this chapter were first studied by Jafar [27] - in particular, the 
concept (although not the exact definition) of bottleneck links comes from that 
paper. 

The earlier of our two papers [1] (which actually appeared in publication 
later) considered only the standard dense network, but defined many of the 
important concepts in this chapter. The full general proof was first published 
in the later of our two papers [2] (which appeared earlier). 

With the exception of our new proof of Theorem 4.1 from our earlier paper 
[1], the material on Jafar networks (Section 4.3) is due to Jafar. 
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Delay-rate tradeoff for ergodic 
interference alignment 

5.1 Introduction 

Earlier, in Section 2.8, we discussed the ergodic interference alignment scheme 
of Nazer, Gastpar, Jafar, and Viswanath (which we hereafter to refer to as the 
NGJV scheme). 

Recall that we considered a model of communication over a finite field 
Wq of size q. Since the NGJV scheme (see Section 2.8) requires a particiilar 
n X n channel matrix with entries in \ {0} to occur, the expected delay for 

2 2 

a particular message is {q — 1)" (which is roughly q" for large q). It is clear 
that even for n and q relatively small, this is not a practical delay. (For n — 6 
and q = 3, for example, the delay is 2^^ « 7 x 10^°.) 

There are five questions we would like to try to answer: 

1. Can we find a scheme that, like NGJV, achieves half the single-user rate, 
but at a lower time delay? 

2. Can we find schemes that have lower time delays than NGJV, even at 
some cost to the rate achieved? 

3. Specifically, which schemes from Question 2 perform well for situations 
where we have few users (n small)? 

4. Specifically, which schemes from Question 2 perform well for situations 
where we have many users (n — > oo)? 

5. What is a lower bound on the best time delay possible for any scheme 
achieving a given rate for a given number of users? 
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In Sections 5.3 and 5.4, we define a new set of schemes, called JAP (Sub- 
section 5.3.1), a beamforming extension JAP-B (Subsection 5.3.4), and child 
schemes derived from them (Section 5.4) that have lower time delays than 
the NGJV scheme, for a variety of different rates, answering Question 2. As 
a special case, examined in Subsection 5.3.5, the JAP-B ([n]) schemes achieve 

2 

half the single-user rate, like NGJV, while reducing the time delay from q" to 
^(n-i)(n-2)^ answering Question 1. In Section 5.5, we answer Questions 3 and 
4, by finding and analysing the JAP schemes that perform the best for small 
and large n; the table on page 109 and the graphs on page 110 illustrate the 
best schemes for small n, and Theorems 5.6 and 5.7 give the asymptotic be- 
haviour of the schemes. Question 5 remains an open problem (although we 
do give a lower bound on the delay achievable for the schemes listed above). 

Koo, Wu, and Gill [65] have previously attempted to answer Questions 2 
and 3. We briefly outline their work at the end of Section 5.2. 

5.2 Model 

We give out results in the context of the finite-field interference network with 
fast-fading. 

Recall that the single-user capacity of this channel is logq — H(Z) —: 
D(Z). Extending our previous definition of degrees of freedom (Definition 
1.13) to multiple users, we have the following: 

Definition 5.1. Given an achievable symmetric rate point (r, r, . . . , r), we de- 
fine the symmetric per-user degrees of freedom to be dof = r/D(Z). 

In particular, it's clear that a single user can achieve 1 degree of freedom. 

We define the expected time delay for the NGJV scheme to be the average 
number of time slots we must wait after seeing a channel matrix H until we see 
the corresponding matrix I — H. The time delay is geometrically distributed 
with parameter p, where p is the probability that the random channel matrix 
takes the value I — H. The mean of this random variable is 1/p; hence the 
problem of finding the average time delay is reduced to a problem of finding 
the probability that a desired matrix appears in the next time slot. Since a 
channel matrix has rp- entries, each of which needs to take the correct one 
value of ^ — 1 possible values, the average time delay is 

D = — ^ = iq- if ~ c,-\ (5.1) 

(Here and elsewhere, we write f{q) ~ g{q) i£ f{q)/g{q) — >• 1 as (/ — >• oo.) 

As we mentioned before, this expected delay will be quite large even for 
modest values of q and n. For this reason, we will concentrate on the delay 
exponent. 



5.3. New alignment schemes: JAP and JAP-B 
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Definition 5.2. An interference alignment scheme with expected delay D ~ 
kq^ for some k and T has delay exponent T and delay coefficient k. More specifi- 
cally, we have T := limq_).oo log D/ log(j. 

We regard reduction of the delay exponent as the key aim, with the delay 
coefficient playing a secondary role. In particular, the finite field model is in 
some sense an abstraction of the model where channel coefficients are Gaus- 
sians quantized into a set of size q, where q is chosen large enough to reduce 
quantization error. When q is large, the delay exponent T dominates the delay 
coefficient k in determining size of the expected delay D. 

From Theorem 2.14, we know that the NGJV schemes achieve dof = 1/2, 
and we have just shown in (5.1) it reqiiires a delay exponent of n^. 

We also mention some new schemes outlined in a recent paper by Koo, 
Wu, and Gill [65]. They attempted to answer our Questions 2 and 3, by 
finding schemes - we call them KWG schemes - with lower delay than the 
NGJV scheme. The KWG schemes suggest matching a larger class of matrices 
than simply H and I — H. By analysing the hitting probability of an associated 
Markov chain, they were able to reduce the expected delay, at the cost of a re- 
duction in rate (and hence degrees of freedom). However, their schemes only 
affect the delay by a constant multiple, with the shortest-delay scheme only 
reducing the delay to 0.6A{q - 1)"^ ~ 0.6Aq"^ with a sum-rate of 0.79D(Z) [65, 
page 5]. That is, the KWG schemes only reduce the delay coefficient k, leaving 
the delay exponent as T = n^. For modest q and n (say q = 3, n — 6, again), 
we regard this delay as still impractical. Since the KWG schemes achieve a 
lower rate than the NGJV scheme for the same delay exponent, we shall only 
compare our results with the NGJV scheme. 

5.3 New alignment schemes: JAP and JAP-B 
5.3.1 Three important observations 

In the NGJV scheme, all receivers were able to decode their message by sum- 
ming their two pseudomessages 

n n 

hji [to] m; + hji [ti ] m; = my for ; = 1, . . . , n. 

i=l 1=1 

In other words, the NGJV scheme relies on the linear dependence 

HN + H[fi] = l. 

This scheme has a large delay, because, given H[to], there is only one ma- 
trix, H[fi] — I — H[fo], that can complete the linear dependence. If there were 
a large collection of matrices that could complete the dependence, then the 
delay would be lower. 
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We make three observations to this end. 

First, while NGJV matches two channel states H[to] and H[ti] to form this 
linear dependence, we could use more than two. That is, if we have K + 1 
channel matrices H [to], H [ti ],..., H [t^] such that 

HM + H[ti] + --- + H[tK] = 1, 

than receivers can sum the K + 1 pseudomessages to recover their message, 

n n n 
E hji[to\^i + E hji[ti]mi + h hji[tK]mi = my. 

i=l i=l i=l 

Note that the transmission of a single message is now split among K + 1 chan- 
nel states, rather than 2 as in NGJV. This means that the degrees of freedom of 
this scheme is reduced to 1/ (K + 1) from NGJV's 1/2. 

Second, any linear combination of channel state matrices that sums to I is 
sufficient. That is, if there exist scalars Aq, Ai G such that 

AoH[to]+AiH[fi] = I, 

then all receivers can recover their message by forming the linear combination 
of pseudocodewords 

n n 

^0 X] hji [tojmi + Ai ^ hji [tijm, = my for / = 1, . . . , n. 

Third, NGJV requires all receivers to be able to decode their messages at 
the same time. However, receiver ; can decode its message if 

n n 

E ^jiMmi + hji[ti]mi = mj 

1=1 1=1 

regardless of whether this equality holds for other receivers as well. In other 
words, receiver j can decode its message if 

hjj[to]+hjj[ti] = 1 
hji[to]+hji[h]^0 fori^;. 

Putting these three observations together, we make the following conclu- 
sion: Let H [to], H[ti],...,H [t^] be a sequence of X + 1 channel state matrices. 
If there exist scalars Aq, Ai, . . . , A^ such that for some / 

\ohjj [to] + Xihjj [ti] + ---+ \Khjj [tK] = 1 (5.2) 
\ohji [to] + Mhji [ti] + --- + AKhji [h] = j, (5.3) 

then receiver ; can recover its message by forming the linear combination of 
pseudocodewords 

n n 

Ao YL hiMm H ^ Ajc ^ hji[tj(\mi = my. 

1=1 !=1 



5.3. New alignment schemes: JAP and JAP-B 
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In fact, we only require 

Ao/i;i [to] + Ai/iy; [ti] + • • • + Ak/i;/ M = for i ^ 

since the coefficients A)t can be rescaled to make the top equation (5.2) equal 
to 1 without breaking the bottom equation (5.3). 
Or, writing h™* for the interference vector 

" ■ ■ • '^i-l'^i+l' • • • '^n)' 

we can again rewrite the requirement as 

Ao% [io] + Al/!yy [tl ] + ■ ■ • + Ak/Z;; [fK] ?^ (5.4) 

Aohf [to] + Aihf [ti] + ■ ■ ■ + A^hf t [tjc] = 0. (5.5) 

If n equalities like the (5.4) and (5.5) above hold, we say that "receiver can 
recover its message from H [to], H [ti], . . . , H \tY\ ■" 
The time delay of this scheme is t^ — t^. 

Recall that the average delay is the reciprocal of the probability that a ran- 
dom matrix allows a receiver to recover its message. Thus it wiU be useful to 
note the following lemma. 

Lemma 5.3. Conditional on the interference vectors Hj"'[to], . . . , Hj"*[fx] hein'g lin- 
early dependent, the probabiliti/ that receiver j can recover its message is 1 — 0{q~^). 

Note that we only use a matrix when we know for certain that it fulfills 
our desired criteria - we merely need to know what the probability is that the 
next matrix will do, in order to calculate the expected delay. 

Proof. Since the interference vectors are linearly dependent, there exists a lin- 
ear combination 

AoHf t[to] + Mtif%] + ■■■ + ^KafM = 

where L > of the A^ are norizero. Thus, receiver can recover its message 
provided that the corresponding linear combination 

Ao%[to] + AiHy;[ti] + • • • + \KHjj[tK] (5.6) 

is nonzero; call the probability that this happens p. 

When Ajt 7^ 0, then AfcH,y[fjt] =: Vj^ is uuniform on \ {O}, and when 
Afc = 0, then Xi^Hjjlti^] = too. So (5.6) is the sum of L random variables 
V]( IID uniform on Wq \ {O}. We can write the mass function of each Vj^ as 
(1 + jO)U — p3o, where U is uniform on F^, ^0 is a point mass on 0, and p — 
l/{q — 1). Then the mass function of the L-fold convolution is 

{l-{-p)^)U + {-p)%. 
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Hence, the probability that (5.6) is zero is 




1 1 



The result follows. 



□ 



5.3.2 The scheme JAP(a) 

We now present our new scheme. 

The idea behind the scheme is as follows: We start by seeing some channel 
state H[to\- We then set ti to be the first time slot that allows receivers 1 to ai 
to recover their message (where fli is decided on in advance). Next, we set t2 
to be the first time slot that allows receivers the next fl2 receivers to recover 
their message. And so on, until all n receivers have recovered their message. 

Specifically, fix K < n and a sequence [ai, a2,..., cik] —'• a of length K and 
weight n; that is, in the set 



We write for the partial sums Aj. :— fli + fl2 + • • • + «fc (so in particular 
Ai = ai and = ")• 

Then we define the scheme JAP (a) as consisting of the following K + 1 
steps: 

Step 0: Start with a matrix H [to] . 

Step 1: Set ti to be the first timeslot that allows the first ai receivers 1, 2, . . . , Ai 
to recover their message from H[to\, H[ti]. 



Step k: Set tj^ to be the first timeslot that allows the next aj- receivers Aj^^i + 
1, Aj:_i + 2, . . . , Aj- to recover their message from H [to], H [ti], . . . , H [tj^] . 



Step K: Set t^ to be the first timeslot that allows the final flj^ receivers Af^^i + 
1,Ak-i + 2, . . . ,Ak to recover their message from H[to], H[ti], . . . , H[tfc]. 

By the end of this process, all n = A^ receivers have recovered their message. 

Since the message was split over K + 1 time slots, the common rate of 
communication is D(Z)/(_K + 1), which corresponds to dof = 1/ {K + 1). 




5.3. New alignment schemes: JAP and JAP-B 
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5.3.3 Delay exponent of JAP schemes 

We now examine the delay exponent for our new schemes. 

Theorem 5.4. Consider the n-user finite field interference network. Fix K and a G 
A{n,K). We use the scheme /AP(a) as outlined above. Then 

1. the expected time for the kth round to take place is D ~ q^''^'\ where T]^{a) — 
a^{n — k — 1); 

2. the delay exponent for the whole scheme is 

T(a) :— max Ttfa) — max aUn — k — 1). 

^ ' l<k<K ' l<k<K ' 

Proof. Recall that the expected delay is the reciprocal of the probability the 
desired match can be made. 

Suppose we are about to begin stage fc of a scheme JAP (a). By Lemma 
5.3, the probability we can complete the stage is 1 — 0{c\^^) multiplied by the 
probability that the interference vectors for the next a^ receivers 

Hft[to],Hf [ti] [td 

are linearly dependent. 

If the first k — \ interference vectors are already linearly dependent, then 
we are done (with high probability, by Lemma 5.3). Assume they are not. 

Write S for the span of the first k — \ interference vectors for one of the 
desired aj. receivers /, so 

<S:=span{Hf'N "^fih-xW- 

Since all possible interference vectors in (F^ \ {0})"~^ are equally likely, the 
probability that the next matrix completes a linear dependence is 



|(Fa{o})"-i| (^-i)«-i' 

where s is the proportion of vectors in <S with no zero entries. By counting the 
possible coefficients in F^ used in the span, the inclusion-exclusion formula 
gives us 

« = i-(K-i)J + o(^)=i-o(,-). 

Hence, the desired probability is 

(^(>-o(,-'))^q^^a-o(,-')) 

(where the 1 — 0((j^^) term comes from Lerruna 5.3). 
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This property that a linear dependence is completed must hold for all a^^ 
receivers, which happens with probability (g~("~'^~i))"<; — hence 
the first result. 

For the second resiilt, note that, as ^ — > oo, the delay is dominated by the 
delay for the slowest round. □ 

5.3.4 Improving delay with beamfonning: JAP-B 

Beamforming slightly improves the performance of JAP (a) schemes, combin- 
ing ideas from the original Cadambe-Jafar interference alignment [28] with 
the JAP scheme. 

In round k we can guarantee that the interference matches up for receiver 
/ :— A)t_i + 1. Each transmitter i, instead of repeating their message Wi, rather 
encodes (/i/,[fjt]])~^/i/i[to]ni,-. (Since the coefficient ft;, cannot be and q is 
prime, the inverse term certainly exists.) The total received interferences at 
receiver 1 at times to and are both equal to X],-^; ft;; [toJn^iV so can be estimated 
and cancelled. 

We refer to such schemes that take advantage of beamforming as JAP-B (a) 
schemes. 

Theorem 5.5. The delay exponent of a JAP-B (a) scheme indexed by sequence a is 
TB(a) max (flfc -l){n-k- 1). 

Proof. At each round, receiver / — A^^i + 1 wiU automatically be able to re- 
cover its message, leaving the JAP scheme to align interference for the other 
fljt — 1 users. (Independence of the coefficients hji ensures that the scheme stiU 
has the same problem to solve.) □ 

In particular, the JAP-B scheme will always outperform the JAP scheme 
with the same sequence a. 

5.3.5 An interesting special case: JAP-B ([n]) 

An interesting special case of the JAP-B schemes is the case when K — 1 and 
ai = n; we call this scheme JAP-B ( [n] ) . 

In this case we have 1/(X + 1) = 1/2 degrees of freedom for a rate of 
D(Z) /2. From Theorem 5, we see that the delay exponent is 

- (ai-l)(n-l-l) = (n-l)(n-2). 

Effectively, the JAP-B ([n]) scheme works by using beamforming to auto- 
matically cancel transmitter I's interference, then for users 2,3, ... ,n requiring 
the existence of diagonal matrices Dq, Di such that DoH[to] + DiH[ti] = I. 



5.4. Child schemes: using time-sharing 
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Note that this is the same rate as is achieved by the original NGJV scheme, 
but that the delay exponent has been reduced from NGJV's rp- to 

(n-l)(n-2) = n2- (3n-2). 

For small n in particular, this is a worthwhile improvement (see figure, p. 110). 

5.4 Child schemes: using time-sharing 

Another way to generate new alignment schemes is by time-sharing schemes 
designed for a smaller number of users. 

Call the NGJV, KWG, JAP and JAP-B schemes 'parent schemes'. Given 
a parent scheme for the m-user network, we can modify for the any n-user 
network with n > m, giving what we call a 'child scheme'. 

Specifically, we use resource division by time (see Subsection 2.6.1) to split 
the network into (^) subnetworks, each of which contains a unique collection 
of just m < nof the users. Within each of these m-user subnetworks, a parent 
scheme is used, while the other n —m transmitters remain silent. 

Resource division by time is often known as time-division multiple access or 
TDMA - we use that abbreviation in the rest of this chapter. 

Such a child scheme clearly has the same delay exponent as the parent 
scheme, with the rate - and thus the degrees of freedom - reduced by a factor 
oi m/n. So an m-user JAP-B scheme shared between n users gives dof = 
m/n{K + l). 

In particular, time-sharing the NGJV schemes for smaller networks gives 
a collection of schemes with a lower delay exponent m^ < rp- than the main 
NGJV scheme for a given number of users, reducing the degrees of freedom 
from 1/2 to m/ln. 

(We are not aware that the idea of time-sharing NGJV schemes has previ- 
ously appeared in the literature. However, the idea seems simple enough that 
we regard this as the 'current benchmark' against which we should compare 
our new schemes.) 

Interestingly, it seems that child schemes derived from time-sharing an 
NGJV-like JAP-B ( [n] ) parent scheme are particularly effective, and very often 
performs better than other JAP-B schemes. We discuss this point further in 
the next section. 

5.5 Best schemes 
5.5.1 General case 

Given a number of users n and a desired number of degrees of freedom dof — 
1/ (X -|- 1), we wish to find a scheme with the lowest delay exponent. 
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For K — n — 1 or n, when dof = l/Morl/(M + l), the best JAP-B schemes 
have delay exponent Tb([1,. . .,1,2]) = Tb([1, . . ., 1, 1, 1]) — 0. This is the 
same delay exponent as TDMA, which has dof = 1/n also. Thus we need not 
consider schemes with K — n — 1 or n. 

For K <n — 2 the best parent scheme wiU be a JAP-B scheme with param- 
eter vector a G A{n,K). We write T{n,K) for this best delay exponent, that 
is 

T(n,K) := min rR(a) = min max (at — l)(w — — 1). 
aeA{n,K) a€A{n,K) l<k<K^ 

We can boimd T{n, K) as follows. 

Theorem 5.6. Fix n and K < n — 2. For T{n, K) as defined above, we have the 
following bounds: 

|(n-2)-(2n-K-2) < T{n,K) < |(n-2) 

The gap between the upper and lower bounds grows linearly with n. 
The following lemma on partial harmonic sums will be useful. 

Lemma 5.7. Let S(w, K) be the partial harmonic sum 

^11 1 

S{n,K) ^ = + ■ ■ ■ + — — . 

f—',n-k-l n - 2 n - K-1 



Then we have the bounds 



^ <S{n,K)< ^ 



n-2- ' ' ' - n-K-2 

Of course, tighter boimds are available by comparing J^l/k to J l/x dx, 
but this suffices for our needs. 

Proof. There are K terms in the sum, the largest of which is 1/ (n — X — 2) and 
the smallest of which is 1/ (n — 2). □ 

We can now prove Theorem 5.6. 

Proof of Theorem 5.6. The value of T{n, K) is lower-bounded by the value of the 
same minimisation problem relaxed to allow the fljt to be real. That is, 

T{n, K) = min max (a^ - 1) (n - fc - 1) 
> min max (flj. — l)(n — fc — 1). 

The relaxed problem is solved by waterfiUing, setting fl^ — 1 — c/ {n — k — 1). 
Requiring the weight of a to be n forces 

n-K (n-K)(n-K-2) 

c = > — 

S{n,K) - K 
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where we have used Lemma 5.7. Rearrangement gives the lower boimd. 
An upper bound is obtained by using the same c and taking 



This gives 



n — k — 1 



< 



n — k—1 



Tb{^) < c + max(n — fc — 1) 

k 



< 



n-K 
Sin,K) 

(n-K)(n-2) 



1. 



K 



+ {n-2) 

+ (n - 2), 



where we have used Lemma 5.7. Rearrangement gives the upper bound. □ 



5.5.2 Few users: small n 

For small values of n, we can find the best parent JAP-B schemes by hand. 
(The task is simplified by noting that the optimal will be norizero and in- 
creasing in k.) The table below gives the delay exponents of the best JAP-B 
schemes for n — 3,. ..,8 and K < n — 2. 

Best JAP-B(a) schemes for small values of 
n and K, and their delay exponents. 





n = 3 


fi = 4 


M = 5 


n = 6 


n = 7 


n = 8 


K= 1 


2 


6 


12 


20 


30 


42 


dof = 1/2 


[3] 


[4] 


[5] 


[6] 


[7] 


[8] 


K = 2 





2 


4 


8 


12 


18 


dof = 1/3 


TDMA 


[1,3] 


[2,3] 


[3,3] 


[3,4] 


[4,4] 


K = 3 







2 


4 


6 


8 


dof = 1/4 




TDMA 


[1, 1,3]* 


[1,2,3]* 


[2,2,3] 


[2,3,3] 


K = A 









2 


4 


6 


dof = 1/6 






TDMA 


[1,1,1,3]* 


[1,1,2,3]* 


[1,2,2,3]* 


K = 5 











2 


4 


dof = 1/6 








TDMA 


[1,1,1,1,3]* 


[1,1,1,2,3]* 


K = 6 













2 


dof = 1/7 










TDMA 


[1,1,1,1,1,3]* 


K = 7 















dof = 1/8 












TDMA 



Asterisks mean that the choice of a achieving this delay exponent is non-unique. 
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Delay-rate tradeoff for TDMA, NGJV and JAP-B 
schemes for n-user interference networks 



Schemes 

Parent Child 



TDMA 
NGJV 

JAP-B([n]) 
Other JAP-B 



Delay exponent 
20 

n = 5 



10 



n = 5 
K = 2~ 



□ 



Delay exponent 
10 - « = 4 



Degrees of freedom 



^ Degrees of freedom ^-^ 

Delay exponent 



30 



20 



10 



n = 6 



■ □ - 



o 

■ 

❖ n = 5 

K = 2 



0.5 



Degrees of freedom 



0.5 



Delay exponent 



40 



30 



20 



10 



n = 7 



Delay exponent 
60 

50 



40 



♦ 30 



20 



10 



mo n = 5 

□ K = 2 



-I — ▲ u 







Degrees of freedom 



0.5 



n = 



■ 

ta o n = 5 

° K = 2 



-n = » 
K = 2 



Degrees of freedom 



0.5 
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We can also consider child schemes based on parent JAP-B schemes. The 
figure on page 110 plots the performance of NGJV and all JAP-B schemes, as 
well as child schemes derived from them, for n — 3,. . .,7. Note that for many 
values of n and dof , the scheme with the lowest delay exponent is JAP-B ( [n] ) 
or one of the child schemes derived from it. (Note however, that the the parent 
schemes with n = 5,K = 2 and n = 8,K = 2, as well as child schemes derived 
from them, outperform JAP-B([n]) for some degrees of freedom.) 

5.5.3 Many users: n ^ co 

We now consider the performance of schemes in the many-user limit n — > oo. 

In particular, we are interested in two limiting regimes, specifying how the 
degrees of freedom dof (n) should scale with the number of users n. In regime 
I the per-user rate is held constant; in regime II the sum-rate is kept constant, 
so each user's individual rate falls like 1/ n. 

• Regime I, where we hold the degrees of freedom constant as n oo. 
That is, we want to communicate at fixed fraction of the single-user rate, 
as in the NGJV scheme. In this regime I, we take dof («) = a for some 
a e (0, 1/2]. (The NGJV scheme corresponds to a = 1/2.) 

• Regime II, where we allow the degrees of freedom to fall as the number 
of users increases, scaling like 1/n. That is, we want to commujnicate 
at a fixed multiple of the rate allowed by resource division schemes like 
TDMA. In regime II, we take dof (n) = jS/n for some jS > 1. (TDMA 
corresponds to /3 = 1.) 

First, we consider how parent JAP-B schemes perform in the many-user 
limit. 

Theorem 5.8. For regimes I and 11, as above, and as n ^ oo, we have the following 
results for the delay exponent T{n) of parent JAP-B schemes: 

• Regime I: Fix a E (0,1/2]. Then the delay exponent for dof (n) — cc scales 
quadratically like 

^^"^ ~ IiT^"'- 

• Regime II: Fix (6 > 1. Then the delay exponent for dof (n) — ^/n scales 
linearly, in that T{n) = 0{n), or more specifically, 

+ ^- 2^n- o{n) < T{n) <f,n + o{n). 
Proof. For regime I, we need dof = 1/ (X + 1) > a, so we take 







1 















1. 
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But the general bounds on delay exponents from Theorem 5.6 tell us that for 
fixed K we have T{n, K) ^ ^n^. The result follows. 
For regime II, 1/ (X + 1) = dof = fi/n, so we need 



K 

Hence 



n 

J +0(1). 



/3 + o(l). 



K n/^ + 0{\) 
Putting this into the bounds from Theorem 5.6 gives 

(iS + o(l))(n - 2) - (^2n - ^ + 0(1)) < T{n) < (^ + o(l))(n - 2). 

Rearranging gives the result. □ 

Note that in regime I with a = 1 /2, we get T(n) ^ n^, the same as NGJV. 

We noted previously that child schemes produced by sharing the parent 
scheme JAP-B([m]) were particularly effective. The following theorem shows 
this. 

Theorem 5.9. For regimes I and II, as above, and as n oo, we have the following 
results for the delay exponent T{n) of child schemes based on }AP-B{[m]) parent 
schemes: 

• Regime I: Fix IX. G (0,1/2]. Then the delay exponent for dof (n) = a scales 
quadratically, in that 

T(n) = itt-^n^ - ban + 0(1) ~ 4a^«^. 

• Regime II: Fix j6 > 1. Then the delay exponent for dof (n) = ^/nis constant, 
in that 

T(n) = (L2)SJ-l)(L2/3j-2). 

Proof. Recall from Section 5.4 that sharing the scheme JAP-B([m]) amongst n 
users gives dof = m/2n for delay exponent T = {m — l){m — 2). 

For regime I, note that m/2n = dof (n) = a, so we need to take m = [2anJ , 
giving T(n) — ( \_2an\ — 1) ( \_2oin\ — 2). The resiilt follows. 

For regime 11, note that m/2n — dof (n) — jS/n, so we need to take m — 
\_2^\ , giving T(n) = ( \_2^\ - 1) ( \_2^\ -2). □ 

Note that as5aiiptotically, this means that in both regimes child schemes 
from JAP-B([m]) parent schemes are asymptotically more effective than any 
other parent scheme. This is because 

[1/aJ - 1 



5.6. Conclusion 
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(with inequality unless a = 1/2, when no child scheme will achieve the de- 
sired degrees of freedom) and any constant is less than (j6 — 2)n for n suffi- 
ciently large. 

Note also that by the same argument as the above proof, sharing the NGJV 
parent scheme gives T{n) — \o?r?- in regime I, which is less good than sharing 
JAP-B( [m] ), but the same to first-order terms. 

5.6 Conclusion 

In the Secion 5.1, the questions we attempted to answer were: 

1. Can we find a scheme that, like NGJV, achieves half the single-user rate, 
but at a lower time delay? 

2. Can we find schemes that have lower time delays than NGJV, even at 
some cost to the rate achieved? 

3. Specifically, which schemes from Question 2 perform well for situations 
where we have few users (n small)? 

4. Specifically, which schemes from Question 2 perform well for situations 
where we have many users (n oo)? 

5. What is a lower bound on the best time delay possible for any scheme 
achieving a given rate for a given number of users? 

In answer to question 2, we defined the new sets of parent schemes JAP 
and the even more effective JAP-B, and also derived child schemes from them. 
We noted that these had lower time delays - and sometimes significantly 
lower - at the costs of some loss in rate (or equivalently degrees of freedom). 
We saw that the child schemes from JAP-B ([n]) schemes were often particu- 
larly effective. 

In answer to question 1, we noted that the JAP-B ([n]) schemes keep the 
degrees of freedom to 1/2 while reducing the delay exponent from to (n — 

In answer to Questions 3 and 4, we explicitly foimd the best schemes JAP-B 
schemes for n < 8, and analysed the asymptotic behaviour of our schemes as 
n — ^ 00. 

Question 5 remains an open problem. 
Notes 

This chapter is joint work with Oliver Johnson and Robert Piechocki, and is 
based on our paper [3]. 
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The NJGV scheme is due to Nazer, Gastpar, Jafar, and Vishwanath [44]. 
The delay-rate tradeoff problem was first studied by Koo, Wu, and GUI 
[65]. 



6 



Interference, group testing, 
and channel coding 

6.1 Building the interference graph 

In Section 2.6, we looked at resource division schemes such as resource di- 
vision by time (Subsection 2.6.1). We noted that for a user to communicate 
without interference, it was necessary for all of the other users to stand idle. 
However, if not every receiver gets interference for every transmitter, than it 
might be possible for more than one user to communicate through the net- 
work at once. 

For concreteness, consider the N-user finite field interference network with 
fixed fading, so 

N 

yjt = E) hjiXit + Zt (mod q). 

For the moment, so that we can concentrate on the interference, we will as- 
sume that the noise Z is with probability 1, so 

N 

'^jt = E ^ji^'t (mod q). 

i=l 

If we have for some / 7^ / that hjj = hij = 0, then both users i and j can 
communicate simultaneously and interference-free. (Recall that we use the 
word "user" to mean a transmitter-receiver pair. So we mean that if hjj — 
hij — 0, then transmitter i can communicate to receiver i and simultaneously 
transmitter can communicate to receiver both links without interference.) 

More generally, we can build the interference graph to show which users 
interfere with which. 
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Definition 6.1. The interference graph of an N-user interference network with 
fixed fading has vertex set 

V:^{1,2,...,N} 

and edge set 

£ := {// : hji 7^ or hij ^ O}. 

The figure below shows the nonzero links in a network (transmitters on 
the left; receivers on the right), and the interference graph derived from it. 



Network Interference graph 

(only nonzero links shown) ^ 

1 • 

2 • ■ ^^^^^^^ O 2 ^ *\ *^ 

4 • ° ^ ^^ N. 

5 • ^ o 5 ^ V^^'^^ 

6 • ►O 6 A 



So two users / and ; can communicate simultaneously and interference-free 
if they are not joined by an edge in the interference graph. More generally, any 
independent set of the interference graph can commurucate at the same time. 
(Recall that a set of vertices W C V is called independent if no edge in the graph 
joins one vertex in U to another.) 

Optimal use of such a resource division strategy requires operating us- 
ing only maximal independent sets. In particular, the maximum degrees of 
freedom achievable by a resource division scheme is dof — a, where the inter- 
ference number a of the graph is the size of the largest independent set. 



Independent set 



Maximal independent set 



6. 1 . Building the interference graph 
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Given such a network, the users need to find out which other users they 
interfere with. Here is one method they could use to do so: Let each trans- 
mitter / choose a random nonzero message m/ e \ {O}. Then, for each of T 
timeslots, each transmitter i either sends m,- or just sends the empty message 0. 
In other words, let — 1 denote that transmitter i communicates in timeslot 
t, and Xit — that she does not. Then each receiver receives the signal 

N 
i=l 

For large q, the probability that yjt = when at least one transmitter has 
Xit = 1 is small - for the moment we neglect this. Then if i/yt — 0, receiver 
knows that hp — for all i with x^ — 1; conversely, if yjt ^ 0, receiver knows 
that h^i ^ for at least one i with x,f = 1. 

Receiver wants to discover which /zy, are nonzero (but doesn't need to 
know their actual values) in as few tests T as possible - this T will depend on 
the niraiber of users N, the niraiber K of transmitters i for which hji ^ 0, and 
the acceptable error probability e. This is eqmvalent to the problem of group 
testing, which we will outline more fully in the next section. 

Our model would be more accurate if we included a noise term Z that 
wasn't always and if we did not neglect the possibility that signals cancel 
each other out. Thus we would like our group testing protocols to be robust 
to this noise and to still have a small probability of error e. In this chapter, 
we investigate how a channel coding approach to group testing can help with 
this. 

Similar problems in multiuser networks have also been studied from a 
group testing perspective. Berger and coauthors [66] and Capetanakis [67] 
studied this problem using a model where more than one interfering message 
results in a collision where all messages are lost (rather than our model where 
signals are superposed at the receiver). Zhang, Luo, and Guo [68] studied the 
Gaussian network, where low-interference links hji k. are assumed to be 
zero and those signals are treated as noise. 

In the rest of this chapter, we outline the problem of group testing, and explore 
a new approach to it using techniques from channel coding (as outlined in 
Chapter 1). 

We define for the first time group testing channels, which operate much like 
communications channels, and identify an important property where 'only 
defects matter' that allows us to prove a theorem on the number of tests 
needed for accurate group testing to be possible. We also give the first in- 
formation theoretic bound on adaptive group testing, by drawing an analogy 
to channel coding with feedback. 
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6.2 Group testing: a very short introduction 

The problem of group testing concerns detecting the defective members of a 
set of items through the means of pooled tests. (In our previous example, for 
receiver think of the 'defective items' as the interfering transmitters with 
hji 7^ 0.) Group testing as a subject dates back to the work of Dorfman [69] in 
1940s studying practical ways of testing soldiers' blood for S5^hilis, and has 
received much attention from combinatorialists and probabilists since. 

The setup is as follows: Suppose we have of a set N items, of which a 
subset K. of size K is defective. To identify K., we could test each of the N 
items individually for defectiveness. However, when K is small compared to 
N, most of the tests will give negative results. A less wasteful method is to 
test pools of numerous items together at the same time. After a number T of 
such pooled tests, it should be possible to deduce which items were defective. 

Let Xit = 1 denote that item / is included in test t. In the so-called determin- 
istic case, a test oi nt := \{i : = l} \ items of which kt := \{i & IC : Xit = l}\ 
are defective gives a negative result y t = if no defects are tested (fcf — 0) and 
a positive result if at least one defect is pooled into the test {kt > 1). 

After T tests, we make an estimate K, of the defective set, with some aver- 
age probability of error e. We want to choose our tests in such a way that e is 
small, while keeping T as low as possible. 

Traditionally, this has been seen as a combinatorial problem: given N and 
K, one aims to find an JV x T testing matrix X — (xn) such that all (^) possible 
defective sets /C give a different sequence of test results yi, . . ■ ,yT- This gives 
a zero error probability e = 0, and one is interested in how small T can be 
made. (See, for example, the textbook of Du and Hwang [70] for more details 
on the combinatorial approach to group testing.) 

However, an alternative approach is to use random pools. That is, we 
set Xjt to be random Os or Is - typically, = 1 with some probability p 
IID across i and t, where p may depend on K and N. One then investigates 
how big T must be compared to N and K in order to keep the average error 
probability e low. 

Recent progress has been made on this channel coding approach by Atia 
and Saligrama [71], by comparing the problem to the classical problem of 
channel coding, first studied by Shannon [4]. (See Chapter 1 for more details 
on channel coding.) 

The two figures below show an interesting similarity between the two 
problems. Note that our goals are slightly different, though - in channel cod- 
ing we wish to maximise the number of messages M for large blocklengths 
T; whereas in group testing we wish to minimise the number of tests T for a 
large number of items N. 



6.2. Group testing: a very short introduction 
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In single-user point-to-point channel coding, M messages are encoded into 
codewords (x^i, • • • , x^j) of length T. After being sent through some chan- 
nel, the codewords are received as (yi, . . . , j/j), and the original message is es- 
timated as in, with the hope that the average error probability e wiU be small. 

Shannon's celebrated channel coding theorem [4, Theorem 11] (see Theo- 
rem 1.11 of this thesis) tells us how large we can make M compared to T, while 
still being sure that the error probability stays small. Shannon's breakthrough 
was to study a random coding scheme, were the Xmt are all IID according to 
some distribution X. One way to phrase the achievability part of Shannon's 
theorem - Shannon offered a similar phrasing as an alternative in his original 
paper [4, Theorem 12] - is the following: 

Theorem 6.2 (Shannon's channel coding theorem). Consider a communications 

channel {X,y,-p{y \ x)). Let M* = M*{T,e) be the maximum number of messages 
that can be sent through the channel with blocklength T and error probability at most 
e e (0,1). Then 

j^* > 2^ma''x i(X:y)+o(r) as 00. 

Atia and Saligrama [71, Theorem III.l] adapted Gallager's proof [10] of 
the achievability part of Shannon's channel coding theorem to give a similar 
result. This time, we're interested in how many tests T are required to keep 
the error probability arbitrarily low. 

Theorem 6.3. Consider group testing in the deterministic case. 

Let T* = T*{N,K,e) be the minimum number of tests necessary to identify 
K defects among N items with error probability at most e e (0, 1). Then T* < 
T + (log AT) fls N — >• 00, where 

^ . log2( |£| )(|£|) 

T — mm max : 



(It's worth noting that the term inside the maximisation depends only on 
the cardinality |£| of £, not on £ itself.) 
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In channel coding, the main interest is in finding the value of M* for dif- 
ferent channels - the limit 

lim =maxI(X: Y) c 

exists and is called the channel capacity (see Definitions 1.6 and 1.10). 

In group testing, for the deterministic case, Atia and Saligrama [71, Theo- 
rem V.l] showed that we have 

T = 0(KlogN) as X 00 and N ^- 00. 

a bound that Sejdinovic and Johnson [72, Theorem 2] improved to 

^^^^log(K(N-K))^^ /^^log(N-X). ,3^^^^,^^^. 



logX V logK 

(Here, as elsewhere, we use /(x) ^ g{x) to mean that /(3!:)/g(x) 1.) 

Here, we will attempt to find to what range of group testing channels the 
resiilt of Atia and Saligrama can be extended, and some bounds on T - and 
hence T* - for those channels. We also investigate further insights that channel 
coding can give to group testing. 



6.3 Channels 

In channel coding, many different types of communication can be modelled 
by using different channels. Recall from Definition 1.1 that a communication 
channel is defined by stating what inputs x e X and outputs y e y the 
channel can have, and what the probability p{y \ x)of each output is given 
each input. 

We want to do the same for group testing. The input is already con- 
strained: there are N items each of which can be in {xj — 1) or not in {Xi — 0) 
the pool, so the input alphabet is {O, l}'^. So we must define the output al- 
phabet and the probability function. 

We will assume that in a testing pool there is no 'order' to the items, nor 
will any elements not placed in the pool affect the outcome of the test, nor can 
we distinguish between the items other than whether or not they are defective. 
Hence, the outcome can only depend on two things, the number of items in 
the test pool n, and the niunber of those items that are defective k. 

Definition 6.4. A group testing channel consists of 

• an output alphabet y, 

• a probability transition function | n,k) relating the number of items 
n and defectives in a testing pool to the test outcome y. 



6.3. Channels 
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The problem of group testing is then to come up with test designs that 
have a low error probability for as few tests as possible. 

Definition 6.5. A testing pool for N items consists of a vector x — (xi) G 
{O, l}^, where x, = 1 denotes that item / is included in the pool and x, = 
denotes that item / is not included in the pool. We define n := | {/ : x, = 1} | to 
be the total number of items in the pool and fc := | {/ e /C : x, = 1} | to be the 
number of defective items in the pool. 

A test design of T tests for N items consists of 

• a sequence (xj, X2, . . . , xj) of T testing pools (which can be summarized 
by the testing matrix X = (x,f) G {O, l}'^^^); 

• a defective set detection function /C: — > [N]'^), where [N]'^) is the col- 
lection of subsets of {1, 2, ... , N} of size K. 

We can now describe the deterministic case discussed earlier as an example 
of a group testing channel under Defintion 6.4. 

Definition 6.6. The deterministic channel has output alphabet y — {0, 1} and 
probability transition function 



Atia and Saligrama [71, Subsection II-C] also studied two ways in which 
error could be introduced into group testing, which they called the additive 
and dilution models. They showed that their main result (Theorem 6.3) also 
holds true for these two channels. 

In the additive model, a negative pool can actually return a false positive 
result, with some fixed probability q. This could happen in our interference 
graph example from Section 6.1 if we included a noise term Z. This model can 
be recast as an example of a group testing channel. 

Definition 6.7. The addition channel with addition probability q> Q has output 
alphabet 3^ = {0, 1} and probability transition function 



Atia and Saligrama [71, Table 1] calculated that for the addition channel, 

as K — > 00 and N — > oo, we have 
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Using the work of Sejdinovic and Johnson [72, Theorem 6] we can improve 
this to 

^og{K{N-K)) 



eK- 



by setting m = and optimising a = 1 in their equation (22) and rearranging. 
(Interestingly, this is discontinuous with the deterministic channel atq = 0.) 

The dilution model describes the case where a very small number of defec- 
tive items in a testing pool might be 'drowned out' by the nondefective items. 
Specifically, any defective items in the test may each evade the test indepen- 
dently with some probability u, with the potential to cause false negative re- 
sults. This could happen in our interference graph example from Section 6.1 
if we did not neglect the possibility that superposed interfering signals can 
cancel each other out. 

Definition 6.8. The dilution channel with dilution probabilty m > has output 
alphabet 3^ = {0, 1} and probability transition function 

/ , X fo iffc^O, , , ^ fl ifA: = 0, 

p(l n,k) — { p(0 n,k) = { 

yi-vJ' iik>l, yu^ iik>l. 

Atia and Saligrama [71, Table 1] calculated that for the addition channel 
we have 

/KlogN 

V(i-")^ 

Again, using the work of Sejdinovic and Johnson [72, Theorem 6] we can im- 
prove this to 

T~ei+«+/(")Kfl + ^-4^~ 
V logX , 



"'^ ,2 I ,,3 



where 

</(m) < = u^ + u^ 

1 - u 

we do this by optimising a = l/ (l — w)in their equation (24) and rearranging. 
In other words, the addition channel requires about e" w 1 + m times as many 
tests as the deterministic channel, for small u. 

Sejdinovic and Johnson also considered a channel that combines the addi- 
tive and dilutive noise models. 

Definition 6.9. The addition/dilution channel with addition probability q > 
and dilution probabilty m > has output alphabet 3^ = {0, 1} and probability 
transition function 

iq iffc = 0, , , , fl-(7 iffc = 0, 

p(l n,k)^{^ p(0 n,k)^{ ^ 

\l-u^ i£k>l, ifA:>l. 
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Working on the unproven assumption that Atia and Saligrama's result 
(Theorem 6.3) also holds for the addition /dilution channel, Sejdinovic and 
Johnson [72, Theorem 3] prove a similar result for the addition/ dilution chan- 
nel. (Although note that Sejdinovic and Johnson [72, reference 1] were work- 
ing from an earlier preprint of Atia and Saligrama's paper [71, version 2]. This 
had a different, and less rigorous, proof based on t5^ical set decoding, rather 
than Gallager's maximum likelihood approach as in the most recent version 
[71, version 4].) 

Atia and Saligrama and Sejdinovic and Johnson defined their channels in 
terms of complicated Boolean sums and products of random vectors and ma- 
trices with the testing matrix X. But our definition in terms of probability 
transition functions makes the behaviour of such channels clearer, and should 
allow the proof of more universal theorems. It also makes it much easier to 
define new channels to model testing behaviour - and many other existing 
models can be reformulated as group testing channels. 

Definition 6.10. The erasure channel is a model that works like the determin- 
istic channel, but fails to produce a result with some fixed erasure probability 
e. That is, y ^{0,?,l} and 

iffc = 0, ^ , ^ ^ , ^ fl-e iffc = 0, 

p(? n,k) — e, p(0 n,k) = { 
■e iik>l, [O ifA:>l. 

The dilution threshold only gives a positive result if a sufficient proportion 
of the tested items are defective, above some threshold Q e (0,1). That is, 
y = {0, 1} and 

^ , ,^ [o \ik/n<e, , , fl iffc/n<0, 

p(l n,k) = { p{0 n,k) — < 

[l i£k/n>9, [0 iik/n>e. 

The counting channel gives as the output the number of defective items in 
the set. That is, the probability transition function p{y \ n,k) defined implicitly 
be the relation 

Y = k. 

It's worth noting that group testing under the counting channel model is 
equivalent to 0-1 compressed sensing with sparsity exactly k. (A model equiv- 
alent to the count channel has previously been studied by Shapiro and Fine 
[73], Erdos and Renyi [74], and others - for more details see the textbook of 
Du and Hwang [70, Section 11.2].) 

The overflow channel gives as a result exactly how many defective items 
were in the test, up to some limit I. That is, 3^ = {0, 1,2, . . . , Z} and probability 
function definied implicitly by the relation 

y — max{fc, Z}. 
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(When / = n, this is equivalent to the counting channel; when / = 1 this 
is equivalent to the deterministic channel; when I — 2 this is the message 
collision model of Capetanakis [67].) 

The symmetric channel gives a negative resiilt if all items are nondefective, 
a positive result if all items are defective, and an uncertain resiilt if there is 
a mixture of defective and nondefective items. That is, 3^ — {O, ?, 1} and 
relation 



(A model equivalent to the symmetric channel has previously been studied 
by Sobel, Kumar, and Blumenthal [75] and Hwang [76].) 

These are just a few examples - many more realistic error models for group 
testing can be formulated as channels, and perhaps wider use can be made of 
this new concept. 

6.4 When only defects matter 

Note that for many of the group testing channels we have mentioned - includ- 
ing all those studied by Atia and Saligrama and by Sejdinovic and Johnson - 
the output depends only on k, the number of defects in the test, and not on 
n, the total number of items in the test. In other words, 'only defects matter', 
and the number of nondefects in the test is irrelevant. 

Definition 6.11. A channel {y,p) whose probability fimction p(y | n,k) — 
p{y I k) is dependent only on k and not on n is said to have the only-defects- 
matter property. 

In the examples from Defintion 6.11 and earlier, the deterministic, addi- 
tion, dilution, addition/ dilution, erasure, counting, and overflow channels 
have the only-defects-matter property. The dilution threshold and symmetric 
channels do not have the only-defects-matter property 

Another way to state the only-defects-matter property is that for a channel 
where only defects matter, we can make the simplification 



Making this simplification is crucial to the proof of Atia and Saligrama [71, 
for example Section III. A]. Indeed, this is the only specific point about the 
deterministic, additive, and dilution channels that Atia and Saligrama use in 
their proof. Hence, we have the following: 




p(y I X) = p(y I Xx;). 



6.5. Converse part and adaptive testing 
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Theorem 6.12. Consider a group testing channel where only defects matter. Let 
T* = T*{N,K,e) be the minimum number of tests necessary to identify K defects 
among N items with error probability at most e e (0, 1). Then T* < T + o(log N) 
fls N — >• 00, where 

^ . iog2(i£r)(|g|) 

T = mm max -7— ' ' ' ' , . 

P £cX;I(X^\£:X£,Y) 

The proof is the same as that of Atia and Saligrama's theorem [71, Theorem 
m.l]. We briefly outline the proof here. 

Sketch proof. We need to analyse the error probability of a test design. To do 
this, we wiU analyse the probability our estimated defective set K, of cardinal- 
ity K overlaps with the true defective set /C on a set £. 

Given a set £ There are such sets, and sets £ of each possible 

cardinality Using a technique similar to Gallager's proof of Shannon's coding 
theorem, we can bound the probability that we make an error in |£| places as 

Hence we require 



T > 



We need this to be true for every C <Z K,, and can optimise the resiilt over 
the test design parameter p. □ 

Identifying the only-defects-matter property as the crucial factor for prov- 
ing Theorem 6.12 means that Sejdinovic and Johnson's bound on T for the 
addition/ dilution channel is now rigorously proven. 

Whether Theorem 6.12 - or a similar theorem - holds for channels without 
the only-defects-matter property is an open problem. 

Calcijlating, or finding good bounds for, the value of T or T* for the era- 
sure, counting, and overflow channels is also an open problem. 



6.5 Converse part and adaptive testing 

Atia and Saligrama [71, Theorem IV.l] also provide a lower boimd on the 
number of tests needed in group testing. The proof is along the lines of Shan- 
non's converse to the channel coding theorem, and uses Fano's inequality. As 
before, the Atia-Saligrama proof in fact applies to all channels where only 
defects matter. 

Theorem 6.13. Consider a group testing channel where only defects matter. Let 
T* = T*{N,K,€) be the minimum number of tests necessary to identify K defects 
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among N items with error probability at most e G (0, 1). Then T* > T — o (log N) 
as N ^ 00, where 

log2(K-l£l') 

T — minmax : 



- \jC\ 

log ( ) = H(/C I £) (definition of entropy) 



Note that, unlike for channel coding, the bounds T and T do not coincide. 
Therefore, the exact number of tests needed for group testing to work is not 
known. 

We now present a proof of the converse that is slightly simpler than Atia 
and Saligrama's. Our proof is based on theirs [71, Theorem IV. 1], with some 
simplifications based on the standard proof of the converse of Shannon's cod- 
ing theorem, as exposited by Cover and Thomas [6, Section 7.9]. 

Proof. Suppose a genie reveals to us some subset £ C /C of the defective set, 
leaving us to work out the remaining X — |£| defective items. Given £, let fC 
be the random defective set chosen imiformly among the possible CkZ^c^ ) sets 
of size K of which £ is a subset. 
Then we have the following: 

'N-\£\ 
<K-\jC\, 

= H(/C I £,£)+I(/C : £ I £) (HB7) 

< 1 + elog ~ 1^1^ + I(/C : £ I £) (Fano's inequaUty) 

< 1 + elog 1^1) +HXic\c ■■ Y I X^), (6.1) 

where the final step is uses the data-processing inequality and the fact that 
only defects matter. 

We can bound the mutual information term in (6.1) as 

miC\C : Y I X£) 

= H(Y|X£)-H(Y|X^) (HB7) 

T 

= £ {U{Yt I Yi, . . . , Yf_i, X^) - M{Yt I Yi, . . . , Yf_i, X^)) (chain rule) 

t=l 
T 

= £ (H(Yt I Yi, . . . , Yf_i, Xc) - H(Yf I Xict)) (memorylessness) 

t=i 

T 

< £ (H(Yt I Xct) — H(Yt I Xfct)) (conditioning reduces entiopy) 

f=i 

^LHMct-yt\Xct) (HB7) 
= TI{X,C\C ■■ y I Xr). (6.2) 
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(Here, we write Xfct '•— {Xu : z G /C) for fixed t. 
But we can rewrite this mutual information as 

I{X^\C : Y I Xr) = l{X^\c ■■ X£, Y) - HXk\c ■■ Xr) = I(X^\£ : Xc,Y), 

(6.3) 

since X^\^£ and X£ are independent. 

Putting together (6.3), (6.2), and (6.1), we get 

(k - ^ 1 + (x - + '■ 

We can rearrange this to get 



e > 1 - T- 



Sending N — > oo, it is clear that we require 

^> 1os(k-1£i) 



I(X^\£:X£,Y) 

to get the error probability arbitrarily low. 

This has to be true for all £ C /C, and we can optimise over the test inclu- 
sion parameter p. This gives the result. □ 



So far, we have been looking at nonadaptive group testing, where all the test 
pools are decided on ahead of time. 

Instead, we could consider adaptive group testing where tests are performed 
sequentially, and the makeup of a testing pool can depend on the results of 
previous tests. That is, the Xjt are functions of the previous test outcomes 
(yi,y2,---,yt-i). 

Definition 6.14. An adaptive test design of T tests for N items consists of a 
sequence (xi, X2, . . . , xj) of T testing pools, where the testing pool for the tth 
test Xf = Xf 1/2/ • • • / 3/f-i ) can depend on earlier test outcomes. 

This is analogous to channel coding with feedback, where the encoding 
function xt at time t can depend on previous channel outputs 1/1,1/2/ • • ■ ,yt-\- 

As Shannon [77, page 15] showed, for discrete memoryless channels, feed- 
back does not increase the channel capacity. (Although it can help in simpli- 
fying encoding and decoding [6, Section 7.12].) 

Due to the non-tightness of the bounds on testing in the nonadaptive case, 
we will not be able to show that adaptive group testing requires the same num- 
ber of tests as nonadaptive testing, but we will be able to show that it obeys 
the same lower bound and requires no more tests than the nonadaptive case. 

This is (as far as we are aware) the first application of information theoretic 
techniques to adaptive group testing. 
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Theorem 6.15. Let T^^ and (dependent on N,K, and e) be the minimum number 
of tests necessary to identify K defects among N items with error probability at most 
e e (0, l)/or nonadaptive and adaptive group testing respectively. 
Then, as N — >• oo, we have the inequalities 

r-o(logN) <T\< < T + o(logN) 

where T and T are as in Theorems 6.3 and 6.13. 

Proof. The third inequality was proven in Theorem 6.3. The second inequality 
is trivial, as nonadaptive group testing is merely a special case of adaptive 
group testing where the tester chooses to ignore the information of previous 
test results. 

To prove the first inequality, we adapt the proof of Theorem 6.13, and Shan- 
non's proof that feedback fails to improve channel capacity [77, Theorem 6], 
as exposited by Cover and Thomas [6, Theorem 7.12.1]. 

We begin exactly the same way as the proof of Theorem 6.13, to get 

We again use the data processing inequality (but in a slightly different way) 
and the only-defects-matter property on the mutual information term in (6.4), 
to write 

I(/C : £ I £) 

< I(/C \ £ : Y I £) (data processing and only-defects-matter) 
= H(Y I £) - H(Y I /C) (HB7) 

T 

= (H(Yf I Yi, . . . , Yt-i, K,) - H(Yf | Yi, . . . , Yt_i, £)) (chain rule) 

t=i 

= £ (H(Yf I Yi Yt-i,C,\a)-M{Yt \ Yi Yf_i,/C,Xx;f)) 

{Xjt a fimction of Yi, . . . , Yf_i, J) 

T 

< X] (H(Yf I Xct) - H(Yf I Yi, . . . , Yt_i, /C, X^f )) (cond. reduces entr.) 

/-I 

= Y^{M{Yt\Xct-MiYt\Xict)), (6.5) 
t=l 

where (6.5) is because, conditional on Xfct, we know that Yf is independent of 

previous Ys and the defective set /C. 

We can now pick back up with the proof of Theorem 6.13, two lines above 
(6.2), to complete the proof. □ 



6.6. Further work 
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It is tempting to wonder if in fact non-adaptive and adaptive group test- 
ing require exactly the same number of tests = T^^- This would be in 
contrast to results for the deterministic model in the zero-error case, where it 
is known that adaptive group testing can be performed in strictly fewer tests 
than nonadaptive (in the K — > oo, N — > oo regime) [70, page 139]. 

Similarly, Shannon showed that feedback for channel coding does help in 
the zero-error case, but not the arbitrarily-small-error case [77]. 

6.6 Further work 

The investigation of applying information theoretic ideas to group testing is 
at a very early stage and there are many open questions. 

Can we find an analogue of Theorem 6.12 where nondefects also matter? 

That is, can we drop the only-defects-matter property. The proof of The- 
orem 6.12 relies on the fact that, given /C n AC =: jC, the random vari- 
ables P(y I X, /C defective) and P(y | X, /C defective) are conditionally 
independent. While this is no longer true when nondefects matter too, 
they are stiU conditionally independent given £ and the total niraiber of 
items n in the test. Perhaps this promises a way forwards. 

What is T* for different channels? Is there a reliable method to calculate, ex- 
actly or approximately, T* for different channels. Is there an easy way to 
do so? What are the minimising choices of p? 

Can we close the gap between T and T to tighten our bounds? 

Does nonadaptive group testing require more tests than adaptive? That is, 
does — T^^ asjmaptotically. 

The wider view: What other information-theoretic techniques can be useful? 
Sejdinovic and Johnson [72] have tried using message-passing decoding 
algorithms for computationally feasible defective set detection. Cher- 
aghchi and coauthors [78] have used information theoretic techniques 
to study group testing on graphs. What else is there to try? 

That's just a few brief suggestions. 
Notes 

This chapter has benefited from discussions with Oliver Johnson, Dino Sejdi- 
novic, and Robert Piechocki. 

Group testing was first studied by Dorfman [69] in the context of testing 
soldiers' blood for S5^hilis. 
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The information theoretic approach to group testing is due to Atia and 
Saligrama [71]. A recent paper of Sejdinovic and Johnson was also useful [72]. 

The textbook of Du and Hwang [70] was useful for material on group test- 
ing. The textbook of Cover and Thomas [6] was useful for material on channel 
coding. 
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Conclusions and further work 



In this thesis, we've examined a number of different ways of combating inter- 
ference in wireless networks. 



• In Chapter 3 we saw how the simple interference-as-noise technique can 
be effective over short hops in well-structured networks. 

• In Chapter 4, we saw how interference alignment can give tight bounds 
on the performance of large random networks. 

• In Chapter 5, we examined the tradeoff between delay and commimica- 
tion rate when using ergodic interference alignment. 

• In Chapter 6, we examined how group testing can help with network 
performance, and how channel coding techniques can help with group 
testing. 



As we've gone through, we have left a number of pointers to open ques- 
tions and further work (in particular, see Sections 3.4, 4.6, 5.6, and 6.6). We 
now have the opportunity to take a brief look at the wider picture. 

• While interference alignment has been an important theoretical break- 
through, it has had few practical benefits to date. The work in Chapter 
5 of this thesis on delay-rate tradeoff is a step in the right direction, but 
more work is needed. How can we reduce the complexity of schemes? 
Can we reduce the amount of channel state information required? Can 
we reduce blocklengths further? Are the schemes robust to errors in 
channel estimation? 
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• There is no Shannon's channel coding theorem for networks. That is, 
there is no general result that tells one how to calculate the capacity re- 
gion (or even just the sum-capacity) of a network. (The best current 
result is the cutset bound of Cover and Thomas [6, Theorem 15.10.1].) 
Even a result for the Gaussian case seems far off - Jafar refers to a resiilt 
in the Gaussian case for only interference networks as "the holy grail of 
network information theory" [27, page 1]. 

• Most capacity theorems for networks - including those in this thesis - 
are nonconstructive. That is, no explicit codes are given. Are some cod- 
ing schemes particularly well suited to interference alignment schemes, 
or can standard codes (like low-density parity-check codes) be adapted 
to this new area? 

• Theorems about sum-capacity (such as ours in Chapter 4) do not give 
everything we want to know about a network. After all, a network oper- 
ating solely at its optimal sum-rate may provide very poor performance 
for some unfortunate users. Concepts of fairness and cooperation also 
need to be taken into account. 

• As wireless devices become ever more ubiquitous, many engineering 
problems in this area become ever more severe. Networks must be set 
up 'ad hoc' with little prior knowledge (the example in Section 6.1 shows 
one way of approaching this); also, interference will become a much 
bigger problem, and 'green' technologies with low power consumption 
wiU be important. 

• Work on the connection between group testing and channel coding is at 
a very early stage - in Section 6.6 we listed a number of future directions 
for research. Are other mathematical problems that involve inference 
about unknown quantities approachable from an information theoretic 
point of view? How far can Shannon's work at a 1940s telephone com- 
pany take us? 
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